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ABSTRACT 


Comparison  of  Neural  Network  and  Linear  Regression  Models  in  Statistically  Predicting 
Mental  and  Physical  Health  Status  of  Breast  Cancer  Survivors: 

Alicia  Ottati,  M.A.,  M.S.,  2015 

Thesis  directed  by:  Michael  Feuerstein,  PhD,  MPH,  Professor,  Department  of  Medical 
and  Clinical  Psychology 

In  the  U.S.,  there  are  currently  13.7  million  cancer  survivors  (38).  Many  cancer 
survivors  experience  problems  with  post-treatment  mental  and  physical  functioning. 
Although  research  has  identified  important  contributing  factors  regarding  these  problems, 
traditional  predictive  statistical  modeling  accounts  for  less  than  half  the  variance  in 
mental  and  physical  function  (16;  17;  1 13).  The  relationship  among  these  factors  may  be 
better  accounted  for  by  a  non-linear  modeling  approach.  The  goal  of  this  doctoral  study 
was  to  determine  whether  a  non-linear,  adaptive  predictive  model  demonstrated  better 
model  fit,  showed  greater  predictive  accuracy,  and  accounted  for  a  greater  contribution  of 
independent  variables  over  a  traditional  statistical  model  with  regard  to  mental  and 
physical  functioning  in  post-treatment  breast  cancer  survivors. 

Using  demographic,  medical,  and  clinical  variables,  linear  regression  was 
compared  to  neural  network  modeling  in  predicting  mental  functioning  and  physical 
functioning  in  a  sample  of  post-treatment  breast  cancer  survivors. 
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Contrary  to  the  a  priori  hypotheses,  the  neural  network  model  did  not  outperform 
the  linear  regression  model  in  predicting  mental  and  physical  functioning  of  post¬ 
treatment  breast  cancer  survivors.  However,  both  linear  regression  and  neural  network 
modeling  identified  modifiable  variables  (clinical  domains  of  the  Cancer  Survivor 
Profile)  as  important  predictors  of  post-treatment  mental  and  physical  functioning,  with 
the  neural  network  confirming  the  findings  of  the  linear  regression  models.  The  neural 
network  model  also  added  to  the  results  of  the  linear  regression  by  identifying  additional 
important  variables  (age,  time  since  diagnosis)  that  may  have  a  non-linear  relationship 
with  mental  and  physical  functioning.  These  findings  may  promote  a  better 
understanding  of  post-treatment  health  status  and  promote  targeted  clinical  interventions. 
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CHAPTER  1:  Background 


Introduction 

The  goal  of  the  present  study  was  to  evaluate  the  self-reported  mental  and 
physical  function  of  post-treatment  breast  cancer  survivors  using  a  traditional  statistical 
approach  (linear  regression)  and  a  neural  network  approach.  Although  linear  regression 
models  are  a  commonly  used  statistical  approach  in  cancer  survivor  research,  these 
models  may  not  account  for  the  full  variability  of  symptoms  in  the  post-treatment 
experience  (16;  17)  which  may  be  related  to  limitations  in  modeling  complex,  nonlinear 
relationships.  Neural  network  models  are  nonlinear,  adaptive  predictive  model.  These 
models  are  iterative  and  learn  from  the  characteristics  of  the  variables  used  in  the  model 
to  reduce  overall  error  in  the  model,  potentially  allowing  for  more  accuracy  and 
complexity  in  model  prediction.  As  a  result,  neural  network  models  may  provide  a  better 
model  fit,  more  predictive  accuracy,  and  better  sensitivity  to  the  relationships  among 
predictor  variables  and  measures  of  mental  and  physical  function.  More  accurate 
predictive  models  may  assist  researchers  in  identifying  optimal  areas  for  interventions 
that  improve  mental  and  physical  functioning  in  breast  cancer  survivors. 

The  two  statistical  approaches  (linear  regression  and  neural  network  analysis) 
were  compared  to  determine  which  method  provided  the  best  model  fit,  showed  the 
greatest  predictive  accuracy,  and  accounted  for  the  greatest  contribution  of  the 
independent  variables  in  statistically  predicting  the  mental  and  physical  function  of 
cancer  survivors.  Model  fit  was  determined  by  comparing  the  mean  square  error  (MSE) 
of  each  predictive  model  and  mean  absolute  percentage  error  (MAPE)  was  used  to 

compare  the  predictive  accuracy  of  each  model.  A  global  sensitivity  analysis  was 
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originally  planned  to  determine  the  contribution  of  each  independent  variable  to  each 
predictive  statistical  model;  however,  this  analysis  could  not  be  conducted  because  of 
limitations  in  the  neural  network  software  that  did  not  provide  a  method  to  hold  the 
contributions  of  the  independent  variables  in  the  full  neural  network  model  constant  for 
the  required  comparative  analyses.  Instead,  an  alternative  sensitivity  analysis  was 
conducted  comparing  squared  semipartial  correlation  coefficients  among  the  models  to 
determine  which  independent  variables  were  most  important  in  model  prediction. 

With  regard  to  the  population  studied,  breast  cancer  survivors  comprise  the  largest 
population  of  female  cancer  survivors  and  these  individuals  can  experience  difficulties 
with  post-treatment  mental  and  physical  function  (6;  19;  22;  38;  53;  111).  Extensive 
research  has  been  conducted  with  this  cancer  survivor  population  using  traditional  linear 
statistical  methods  to  identify  what  variables  predict  post-treatment  mental  and  physical 
functioning  (8;  16;  17;  33;  103);  however,  these  models  account  for  less  than  half  the 
variance  in  mental  and  physical  functioning  in  breast  cancer  survivors  (16;  17).  These 
findings  suggest  that  contributing  factors  for  over  half  of  the  variance  in  these  outcomes 
is  still  unknown.  Neural  network  analyses  use  an  adaptive,  iterative  modeling  process 
that  reduces  predictive  error  in  the  model  and  has  the  ability  to  identify  nonlinear, 
complex  relationships  which  may  better  account  for  this  additional  variance  and  identify 
important  modifiable  factors  that  may  respond  to  interventions.  To  date,  no  research  has 
evaluated  the  use  of  neural  network  models  in  predicting  mental  and  physical  functioning 
in  breast  cancer  survivors. 

This  doctoral  dissertation  project  consists  of  a  review  of  breast  cancer 
epidemiology,  survivorship,  stages,  treatment,  and  health  status.  Then,  predictive 


2 


statistical  modeling  is  also  discussed.  These  sections  provide  a  framework  to  support  the 
need  for  this  study.  This  project  also  includes  an  outline  of  the  study’s  methodology, 
analytic  plan,  results,  clinical  implications,  and  recommendations  for  future  research. 

Epidemiology 

U.S.  epidemiological  data  estimate  that  more  than  1.6  million  individuals 
(855,220  men;  810,320  women)  received  a  diagnosis  of  cancer  in  2014  (5).  Among 
U.S.  women,  the  three  most  commonly  diagnosed  cancer  types  in  2014  were  projected  to 
be  breast,  lung/bronchus,  and  colorectal  cancer  (5).  Incident  female  breast  cancer  cases 
were  projected  in  232,670  women  which  accounts  for  29%  of  all  new  female  cancers  (5; 
56;  102).  The  median  age  of  women  diagnosed  with  breast  cancer  is  61  (56). 

Although  diagnoses  of  lung,  breast,  prostate,  and  colorectal  cancer  comprise  the 
largest  group  of  incident  cancer  types  in  the  U.S.,  deaths  of  individuals  with  lung  cancer 
far  exceed  deaths  of  individuals  with  breast  cancer  (102).  As  a  result,  female  breast 
cancer  survivors  account  for  22%  of  all  cancer  survivors  in  the  U.S.  and  comprise  the 
largest  cancer  survivor  group  (with  prostate  cancer  survivors  accounting  for  20%  of  all 
survivors)  (38).  Breast  cancer  survivors  can  also  report  difficulties  with  post-treatment 
health  status  such  as  mental  and  physical  functioning.  Identifying  which  factors  are 
significant  predictors  of  post-treatment  health  status  is  an  important  step  in  being  able  to 
develop  effective  interventions  to  improve  mental  and  physical  functioning,  especially 
with  regard  to  factors  that  are  modifiable.  Research  evaluating  health  status  in  breast 
cancer  survivors  revealed  that  significant  predictors  of  mental  function  included  age, 
being  partnered,  income,  race,  cognitive/mood  status,  social/emotional  status,  fatigue, 
fear  of  recurrence,  sleep  difficulties,  and  number  of  chronic  conditions  (8;  16;  17;  24;  33; 
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103).  Significant  predictors  of  physical  function  included  age,  income,  race,  employment 
status,  anticancer  treatment  received,  menopausal  status,  social/emotional  status,  diet  and 
exercise,  fear  of  recurrence,  fatigue,  dizziness,  urinary  incontinence,  lymphedema, 
children  in  the  home,  and  number  of  chronic  conditions  (8;  16;  17;  24;  33).  However, 
research  studies  examining  post-treatment  health  status  in  breast  cancer  survivors  often 
use  traditional  statistical  approaches  such  as  linear  regression  which  account  for  less  than 
half  the  variance  in  mental  and  physical  functioning  in  this  population  (8;  16;  17;  33; 

103).  Neural  network  modeling  of  these  same  factors  may  produce  a  more  accurate 
predictive  model.  Therefore,  breast  cancer  survivors  represent  an  important  population  to 
understand  post-treatment  mental  and  physical  functioning. 

Cancer  Survivorship 

There  are  currently  13.7  million  cancer  survivors  in  the  U.S.  (38)  and  this 
population  is  growing,  largely  due  to  advances  in  cancer  detection  and  treatment  (101; 
102).  Cancer  survivorship  can  be  defined  in  many  ways.  Many  advocacy  groups  use  an 
expansive  definition  of  survivorship  that  starts  from  the  time  of  diagnosis  until  the  end  of 
life  and  extends  not  only  to  the  cancer  patient,  but  also  to  his  or  her  family,  friends,  and 
caregivers  (55).  However,  because  the  experiences  of  diagnosis  and  active  treatment  can 
differ  from  experiences  in  post-treatment  (66),  the  most  common  definition  of  cancer 
survivorship  used  in  research  refers  to  an  individual  who  has  completed  primary 
treatment  (79).  For  the  purposes  of  this  study,  cancer  survivors  were  defined  as  those 
individuals  who  are  living  with  a  history  of  cancer  and  have  completed  primary 
anticancer  treatment. 
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There  are  28.8  million  cancer  survivors  worldwide  (15.2  million  women  and  13.5 
million  men)  (18).  Of  these  cancer  survivors,  breast  cancer  accounts  for  34%  (5.2 
million)  of  all  female  cancers  (18).  In  the  U.S.,  the  female  breast  cancer  survivor 
population  is  2.9  million,  accounting  for  22%  of  all  cancer  survivors  and  41%  of  all 
female  cancer  survivors  (38;  101). 

Cancer  survivorship  can  be  characterized  by  various  long-term  and  late  effects  of 
the  cancer  and/or  anticancer  treatment  that  may  not  manifest  until  months  or  years 
following  the  end  of  primary  treatment  (55).  Late  and  long-term  effects  of  cancer 
survivorship  can  be  affected  by  stage  at  diagnosis,  time  since  diagnosis,  treatment 
received,  time  since  treatment,  age,  race,  education,  socioeconomic  status,  social  support, 
and  health  status  (15;  54).  The  above  cancer  survivorship  research,  as  well  as  experience 
with  previous  studies  of  breast  cancer  survivorship  (19;  22;  53;  112),  informed  the 
decision  to  include  the  following  as  independent  variables  in  the  current  doctoral 
dissertation  project:  age,  race,  education,  partner  status,  employment,  income,  stage  at 
diagnosis,  time  since  diagnosis,  treatment  received,  adjuvant  treatment,  time  since 
treatment,  menopausal  status,  symptom  burden,  function,  health  behavior,  and  health 
service  needs  (see  Table  1). 

Stages 

Cancer  stage  at  diagnosis  can  impact  the  late  and  long-term  effects  experienced 
by  breast  cancer  survivors.  Staging  is  a  classification  method  used  to  describe  the  extent 
of  cancer  in  the  body  (2).  The  standardized  guidelines  for  specific  cancer  staging 
referred  to  as  the  TNM  {tumor,  node,  metastasis )  staging  system  are  outlined  in  the 
American  Joint  Committee  on  Cancer  (AJCC)  (7).  T  indicates  the  size  of  the  primary 
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tumor  by  numbers  0  to  4  with  higher  numbers  indicating  a  larger  mass.  N  describes  the 
spread  of  the  tumor  to  regional  lymph  nodes  using  numbers  0  to  3  as  categorical 
representations  of  the  number  of  nodes  affected  (higher  numbers  indicate  a  greater 
number  of  affected  nodes).  M  denotes  whether  the  cancer  has  metastasized,  or  spread,  to 
distant  organs  or  lymph  nodes  with  0  representing  no  distant  spread  and  1  representing 
distant  spread.  Specific  cancer  stages  are  established  based  on  the  type  of  cancer  and  the 
TNM  grades  assigned.  Although  there  are  standardized  staging  guidelines,  staging 
systems  vary  depending  upon  the  specific  cancer  type. 

Stages  of  breast  cancer  range  from  0  to  IV  and  are  assigned  based  on  the  AJCC 
guidelines  (7).  Stage  0  breast  cancer  indicates  the  earliest  form  of  breast  cancer  known  as 
carcinoma  in  situ  (CIS)  which  describes  a  cancerous  tumor  that  is  localized  to  the  cells  of 
the  breast  ducts  or  lobules  (4).  Stage  I  breast  cancer  describes  a  primary  tumor  that  is  2 
centimeters  or  less  in  size  with  or  without  micrometastases  (i.e.,  metastases  to  localized 
tissue  nodes)  to  axillary  lymph  nodes.  Stage  II  breast  cancer  is  diagnosed  for  primary 
tumors  that  are  less  than  2  centimeters  but  with  a  greater  level  of  axillary  node 
micrometastases  than  Stage  I,  primary  tumors  between  2-5  centimeters  with  or  without 
axillary  node  micrometastases,  or  tumors  that  are  larger  than  5  centimeters  without  any 
indication  of  axillary  node  micrometastases.  Stage  III  breast  cancer  consists  of  a  primary 
tumor  less  than  5  centimeters  around  with  a  higher  level  of  axillary  node  micrometastases 
than  that  of  Stage  II,  a  primary  tumor  greater  than  5  centimeters  with  micrometastases  to 
axillary  or  mammary  lymph  nodes,  a  primary  tumor  that  has  invaded  the  wall  of  the  chest 
or  skin  with  or  without  micrometastases  to  axillary  or  mammary  lymph  nodes,  or  a 
primary  tumor  of  any  size  with  micrometastases  to  the  clavicle  and/or  a  higher  level  of 
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micrometastases  to  axillary  or  mammary  lymph  nodes.  Stage  IV  breast  cancer  (also 
referred  to  as  metastatic  breast  cancer)  describes  cancer  of  any  size  that  has  metastasized, 
or  spread,  to  distant  organs  or  lymph  nodes. 

Stage  at  diagnosis  is  an  important  factor  to  include  in  this  study  of  post-treatment 
breast  cancer  survivors  because  it  is  associated  with  intensity  and  type  of  anticancer 
treatment  received  which  can  have  an  effect  on  mental  and  physical  functioning. 
Specifically,  early-stage  breast  cancer  survivors  (i.e.,  stage  0  to  II)  often  receive  lower 
levels  of  chemotherapy  and  radiation  than  those  who  are  later-stage  at  diagnosis  (i.e., 
stage  III  to  IV)  (6).  Similarly,  women  with  later  stage  breast  cancer  are  more  likely  to 
receive  adjuvant  therapies  than  women  with  early  stage  breast  cancer  (6).  Although 
traditional  regression  models  have  been  used  to  evaluate  these  factors  in  previous 
research  (16;  24;  33),  the  adaptive,  iterative  nature  of  a  neural  network  analysis  may 
provide  a  more  accurate  model  of  the  relationship  among  stage  at  diagnosis,  anticancer 
treatment,  adjuvant  treatment,  and  mental  and  physical  functioning  in  this  population. 

Treatment 

Treatment  for  cancer  varies  based  on  prognostic  factors  such  as  familial  history  of 
cancer  and  stage  at  diagnosis,  as  well  as  patient  preference  (81-84).  General  primary 
treatment  options  include  no  intervention,  surgery,  radiation,  and/or  chemotherapy  (101). 
Adjuvant  therapy  refers  to  treatment  which  is  given  in  addition  to  primary  treatment  and 
designed  to  lessen  the  probability  of  disease  recurrence  or  metastases.  Adjuvant 
treatment  can  include  hormonal  therapies,  chemotherapy,  radiation,  or  other  treatments. 

Treatment  for  breast  cancer  includes  surgical  treatment  such  as  breast  conserving 
surgery  (lumpectomy)  or  mastectomy,  radiation  therapy,  or  chemotherapy  (81).  Because 
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Stage  0  breast  cancer  (CIS)  is  non-invasive,  the  nature  of  treatment  for  this  stage  is 
generally  conservative  compared  to  treatment  of  other  stages  of  breast  cancer.  Typical 
treatments  for  CIS  consist  of  various  surgical  approaches  such  as  excision  of  the  affected 
duct(s),  breast  conserving  surgery  or  lumpectomy  which  describes  a  wider  local  excision 
area,  or  mastectomy  (11).  Although  generally  considered  non-lethal  with  a  low  mortality 
rate,  treatment  of  CIS  is  important  because  CIS  is  the  precursor  to  potentially  invasive, 
lethal  forms  of  breast  cancer  (4;  64).  Adjuvant  hormonal  therapies  are  also  recommended 
in  the  treatment  of  breast  cancer  (81).  Tamoxifen  is  the  gold  standard  adjuvant  therapy 
for  premenopausal  women  with  hormone-receptor  (HR)  positive,  early  stage  breast 
cancer;  whereas  aromatase  inhibitors  are  indicated  for  post-menopausal  women  with  HR¬ 
positive,  early  stage  breast  cancer  (80;  94).  Of  women  diagnosed  with  early  stage  breast 
cancer  (stage  I  and  II),  57%  are  treated  with  breast  conserving  surgery,  36%  undergo 
mastectomy,  and  1%  receive  no  intervention  (101).  Most  women  with  an  early  stage 
diagnosis  who  undergo  breast  conserving  surgery  also  receive  adjuvant  therapy  with 
almost  50%  treated  with  adjuvant  radiation  and  33%  receiving  radiation  plus 
chemotherapy  (101).  Among  women  with  late  stage  breast  cancer  (stage  III  and  IV), 

13%  undergo  breast  conserving  surgery,  60%  are  treated  with  mastectomy,  18%  have  no 
surgery,  and  7%  receive  no  intervention  (101).  The  majority  of  women  with  a  late  stage 
breast  cancer  diagnosis  who  have  received  surgery  are  also  treated  with  chemotherapy  in 
addition  to  other,  unspecified  therapies  (101). 

Anticancer  treatment  and  adjuvant  therapy  have  shown  an  association  with  factors 
that  are  significant  predictors  of  mental  and  physical  functioning  in  breast  cancer 
survivors  such  as  memory  and  thinking  problems,  cancer-related  distress,  fatigue, 
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changes  in  appetite/diet,  lymphedema,  pain,  and  sexual  problems  (6).  Linear  regression 
models  have  also  demonstrated  a  statistically  significant  correlation  between  anticancer 
treatment  received  and  physical  functioning  (16;  24;  33).  However,  a  linear  regression 
model  which  revealed  treatment  to  be  a  significant  predictor  accounted  for  only  46%  of 
the  variance  in  the  physical  function  measure  (16).  The  adaptive,  iterative  nature  of  a 
neural  network  model  designed  to  identify  complex,  nonlinear  relationships  may  provide 
a  better  understanding  of  how  treatment  received  impacts  breast  cancer  survivors’  mental 
and  physical  health  status. 

Health  Status 

Health  status  may  generally  be  defined  by  two  categories,  mental  and  physical 
function.  A  recent  population-based  study  of  breast,  prostate,  and  colorectal  cancer 
survivors  >  1  year  post-diagnosis  demonstrated  that  cancer  survivors  endorsed  worse 
general  health  (p  <  0.001)  and  greater  activity  limitations  (p  <  0.004)  than  matched 
controls  (68).  A  systematic  review  of  post-treatment  breast,  prostate,  colorectal,  and 
gynecological  cancer  survivors  by  Harrington  and  colleagues  (54)  reported  that  the  most 
common  physical  and  psychological  symptoms  endorsed  were  depressive  symptoms, 
anxiety,  pain,  and  fatigue;  however,  sleep  problems,  sexual  difficulties,  and  cognitive 
limitations  were  also  reported. 

Linear  regression  studies  examining  mental  and  physical  post-treatment 
functioning  in  breast  cancer  survivors  accounted  for  36-41%  of  the  variance  in  mental 
function  and  38-46%  of  the  variance  in  physical  function  (16;  17).  These  findings 
suggest  that  a  significant  proportion  of  the  variance  in  these  factors  remains  unknown  and 
may  be  related  to  limitations  in  linear  regression  modeling.  Specifically,  linear 
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regression  approaches  are  bounded  (i.e.,  not  iterative  or  adaptive)  and  less  able  to  model 
complex,  nonlinear  relationships.  Neural  network  models  offer  a  statistical  approach  that 
is  adaptive  to  the  inputs  in  the  model  and  able  to  evaluate  complex,  nonlinear 
relationships  (48)  such  as  those  that  may  exist  among  the  demographic,  medical,  and 
clinical  variables  in  this  study  and  mental  and  physical  functioning  in  breast  cancer 
survivors. 

Mental  Function 

Although  many  cancer  survivor  populations  endorse  problems  with  mental 
functioning,  breast  cancer  survivors  report  worse  general  mental  functioning  than 
prostate  and  colorectal  cancer  survivors  (42;  122).  Breast  cancer  survivors  report 
difficulties  with  psychological  problems  such  as  depressive  symptoms,  anxiety,  and 
cognitive  problems  (54).  Depressive  symptoms  have  been  endorsed  by  30%  of  breast 
cancer  survivors  immediately  following  treatment  (40)  and  at  rates  of  21  -  48%  up  to  6 
months  post-treatment  (20;  35;  37;  40).  Breast  cancer  survivors  up  to  6  months  post¬ 
treatment  reported  anxiety  at  rates  of  45  -  48%  (20;  35)  and  cognitive  problems  at  rates  of 
31  -  61%  (31;  100;  106;  119).  A  prospective  study  of  health  outcomes  in  post-treatment 
breast  cancer  survivors  using  multiple  linear  regression  reported  the  following  factors 
were  statistically  significant  predictors  (p  <  .05)  of  mental  function  immediately 
following  treatment:  age  at  baseline  short  temper,  tendency  to  take  naps,  difficulty 
concentrating,  early  awakening,  and  forgetfulness  (47). 

Physical  Function 

Physical  problems  reported  by  breast  cancer  survivors  at  post-treatment  include 
pain  and/or  functional  limitations  (e.g.,  lymphedema),  fatigue,  sleep  disturbance,  and 
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sexual  dysfunction  (54).  Pain  and  functional  limitations  have  been  reported  by  12-79% 
of  breast  cancer  survivors  studied  within  the  first  6  months  post-treatment  and  these 
symptoms  are  associated  with  type  of  treatment  received  (3;  34;  36;  65).  Symptoms  of 
fatigue  were  endorsed  at  the  end  of  treatment  (67)  through  6-months  post-treatment  for 
17-28%  of  breast  cancer  survivors  (9;  62).  Difficulties  with  sleep  were  endorsed 
immediately  following  treatment  to  8  weeks  post-treatment  by  35-54%  of  breast  cancer 
survivors  (34)  and  by  14%  of  breast  cancer  survivors  at  3-6  months  following  adjuvant 
treatment  (76).  Sexual  dysfunction  (e.g.,  vaginal  dryness,  pain,  or  decreased  desire)  also 
has  been  reported  by  21-63%  of  breast  cancer  survivors  studied  within  the  first  6  months 
following  primary  and/or  adjuvant  treatment  (34;  76).  One  prospective  study  of  physical 
functioning  in  post-treatment  breast  cancer  survivors  reported  that  adjuvant 
chemotherapy  had  a  greater  association  with  poor  physical  outcomes  (e.g., 
musculoskeletal  pain,  vaginal  problems,  difficulties  with  weight,  and  nausea)  than  did 
primary  treatment  modalities  (46).  Another  prospective  study  using  multiple  linear 
regression  to  evaluate  health  outcomes  in  breast  cancer  survivors  immediately  post¬ 
treatment  reported  that  the  following  symptoms  were  statistically  significant  predictors  (p 
<  .05)  of  physical  functioning:  lumpectomy  only,  lumpectomy  plus  chemotherapy,  age  at 
baseline,  breast  sensitivity,  aches  and  pains,  muscle  stiffness,  numbness  and  tingling,  and 
unhappiness  with  appearance  (47). 

Studies  evaluating  post-treatment  breast  cancer  survivors  have  used  linear 
regression  models  to  predict  post-treatment  health  status  (8;  16;  17;  33;  103).  These 
studies  have  revealed  important  demographic,  medical,  and  clinical  predictors  of  mental 
and  physical  function,  but  account  for  less  than  half  the  variance  in  these  measures  (16; 
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17)  suggesting  that  there  may  be  more  complex  relationships  among  these  variables  that 
are  not  captured  by  traditional  linear  statistical  modeling.  Neural  network  statistical 
approaches  are  adaptive  to  the  variables  included  in  the  model  and  better  able  to  identify 
nonlinear  relationships  (48).  As  a  result,  neural  network  analysis  may  provide  a  more 
accurate  model  in  predicting  post-treatment  mental  and  physical  functioning  as  compared 
to  linear  regression  models.  The  information  that  follows  provides  an  overview  of  these 
two  statistical  approaches. 

Predictive  Statistical  Models 

The  current  study  aimed  to  evaluate  whether  one  statistical  model  demonstrated  a 
better  model  fit,  showed  greater  predictive  accuracy,  and  accounted  for  a  greater 
contribution  of  the  independent  variables  over  the  other  statistical  model  with  regard  to 
predicting  mental  function  and  physical  function  in  post-treatment  breast  cancer 
survivors.  Although  linear  regression  is  considered  a  traditional  statistical  approach,  it 
was  hypothesized  that  the  neural  network  model  would  outperform  the  regression  model 
on  these  three  outcomes.  A  brief  overview  of  these  two  statistical  methods  is  provided  in 
the  following  sections. 

Linear  Regression 

Linear  regression  is  a  traditional  statistical  model  that  is  widely  used  to  predict  an 
outcome  based  on  a  set  of  predictor  (independent)  variables  (41).  Unlike  logistic 
regression  which  statistically  predicts  the  probability  of  a  case  falling  within  a  certain 
category  (dichotomous  dependent  variable),  linear  regression  techniques  calculate  the 
statistically  predicted  value  of  a  continuous  outcome  variable  (41).  Regression  models 
belong  to  the  family  of  correlational  techniques;  however,  regression  uses  a  more 


12 


sophisticated  statistical  approach  to  determine  the  interrelationships  among  independent 
and  dependent  variables  than  do  correlational  techniques  (90).  Once  the  data  points  of 
the  independent  and  dependent  variables  are  known,  the  data  points  can  be  graphed.  The 
independent  variable  data  points  can  be  plotted  on  a  horizontal  X-axis  and  the  dependent 
variable  data  points  can  be  plotted  on  a  vertical  T-axis.  A  regression  line  can  then  be 
drawn  through  the  plotted  data  points  to  represent  the  line  of  best  fit  from  which  to  make 
predictions  about  the  values  of  the  dependent  variable  given  the  values  of  the 
independent  variable  (41).  In  this  regard,  the  dependent  variable  is  expressed  as  a  linear 
function  of  the  independent  variable  (41).  The  deviation  of  a  specific  data  point  from  the 
regression  line  is  referred  to  as  the  error  or  residual  value  in  the  model  (114).  Smaller 
deviations  of  these  data  points  from  the  predictive  regression  line  produce  an  overall 
model  with  smaller  variability  and  greater  predictive  accuracy  (1 14).  Rather  than 
graphing  these  data  points,  a  basic  formula  can  also  be  calculated  to  determine  the 
regression  line.  The  basic  regression  formula  is  expressed  as  follows: 

Y  -  a  +  bx 

where  Y  is  the  predicted  level  of  the  dependent  variable,  a  represents  the  regression 
constant  (or  intercept  or  the  value  of  Y  when  x  =  0),  b  is  the  slope  of  the  regression  line 
(amount  of  difference  in  Y  for  a  one-unit  change  in  x),  and  x  represents  the  value  of  the 
independent  variable  (41). 

Multiple  linear  regression  is  an  extension  of  the  basic  formula  above  applied  to 
multiple  independent  variables  in  predicting  a  continuous  dependent  variable.  The 
formula  for  multiple  regression  is  expressed  as  follows: 

Y  =  a  +  bj(xi)  +  i  =  1,  2,  3,  4,...,  m 
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where  Y  is  the  predicted  level  of  the  dependent  variable,  a  represents  the  regression 
constant  (or  intercept  or  the  value  of  Y  when  x  =  0),  b,  is  the  regression  coefficient 
representing  the  unique  contributions  of  each  independent  variable  to  the  dependent 
variable,  x,  represents  the  value  of  the  independent  variable,  £,  is  the  error  (or  residual)  in 
the  model,  and  m  represents  the  number  of  independent  variables  (1 14).  As  with  basic 
linear  regression,  the  dependent  variable  in  a  multiple  linear  regression  is  expressed  as  a 
linear  function  of  the  independent  variables  and  smaller  deviations  from  the  predictive 
regression  line  in  a  multiple  linear  regression  model  represent  less  variability  and  greater 
predictive  accuracy  of  the  model.  See  Figure  1  for  a  conceptual  diagram  of  a  multiple 
linear  regression  model. 

Input 


Output  Target 
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Dependent 

Variable 


Independent 

Variables 


Figure  1.  Model  of  Multiple  Linear  Regression. 
From  Sarle  (1994). 
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Neural  Network  Analysis 

Neural  network  analysis  was  developed  and  modeled  based  on  the  connectivity 
and  functions  of  neurons  in  the  brain  (48).  This  statistical  analysis  uses  an  adaptive, 
iterative  approach  that  learns  the  characteristics  of  the  variables  in  the  model  to  reduce 
overall  error  and  increase  the  predictive  accuracy  of  the  model.  Specifically,  this  analysis 
uses  a  model  comprised  of  highly  interconnected  nodes  which  process  information  to 
determine  predictions  (10).  A  set  of  nodes  (or  neurons)  is  referred  to  as  a  layer  and  a 
basic  neural  network  model  consists  of  three  layers:  input,  hidden,  and  output  (48).  See 
Figure  2  for  a  conceptual  diagram  of  a  neural  network  model.  The  input  layer  nodes  are 
comprised  of  the  predictor  variables  (or  independent  variables)  and  are  responsible  for 
sending  information  about  the  predictor  variables  (e.g.,  weights  of  the  connections)  to  the 
hidden  layer.  Nodes  (or  neurons)  in  the  hidden  layer  have  a  dual  purpose.  Hidden  layer 
nodes  first  sum  the  weights  of  the  inputs  from  the  input  layer  (predictor  variables).  Next, 
a  specific  function  algorithm,  referred  to  as  the  activation  function  or  activation  rule,  is 
applied  to  these  summed  values  (48).  Applying  this  activation  function  is  also  known  as 
squashing  the  inputs.  There  are  many  types  of  activation  functions  which  may  be  used  in 
a  neural  network  model,  but  the  most  common  algorithm  used  is  the  sigmoid  function  (a 
bounded  function  that  ranges  from  0  to  1)  (48).  Once  the  input  values  have  been 
squashed,  these  weights  are  summed  again  and  sent  to  the  output  layer.  The  output  layer 
nodes  represent  the  predicted  values  of  the  outcome  of  interest  (e.g.,  physical  functioning, 
mental  functioning)  (97)  given  the  weighted  inputs  provided  from  the  hidden  layer.  The 
model  then  compares  these  predicted  values  to  the  target  values  (also  referred  to  as  the 
dependent  variables),  which  are  known  values,  to  determine  the  accuracy  of  the 
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predictions  (97).  The  difference  between  the  predicted  values  and  the  known  values 


represents  the  error  in  the  neural  network  model  (48). 


Hidden 

Layer 


Figure  2.  Model  of  Neural  Network. 
Adapted  from  Sarle  (1994). 
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There  are  many  types  of  neural  network  models,  but  the  most  commonly  used 
model  is  backpropagation  (115).  Initially,  the  backpropagation  procedure  randomly 
assigns  small  weights  to  the  nodes  in  the  input  layer  (i.e.,  the  layer  containing  the 
independent  variables).  A  feedforward  network  is  then  applied  to  these  random  weights 
such  that  they  are  sent  forward  (or  provided)  to  the  hidden  layer  to  be  summed  and 
squashed,  and  then  these  values  are  fed  in  (or  provided)  to  the  output  layer  to  where  the 
output  predictions  can  be  compared  to  the  target  (dependent)  variables  to  determine  the 
error  in  the  model  (48).  Ultimately,  the  difference  between  the  predicted  output  layer 
and  the  actual  target  values  (i.e.,  dependent  variables)  are  then  computed  as  errors  that 
are  back  propagated  through  the  model  to  adjust  the  initial  weights  of  the  connections 
between  the  layers  with  the  aim  of  reducing  the  overall  error  in  the  model  (48).  These 
errors  are  determined  by  the  sum-of-squares  errors  calculation.  Multiple  iterations  of  this 
back  propagation  method  occur  until  the  error  between  the  predicted  output  values  and 
the  actual  target  values  (i.e.,  dependent  variables)  is  minimized  to  achieve  the  smallest 
amount  of  error  in  the  model.  As  a  result,  a  back  propagation  neural  network  model  uses 
an  iterative  learning  process  to  evaluate  the  data,  minimize  error,  and  make  predictions. 

In  a  comparative  review,  Tu  (115)  outlined  several  advantages  of  neural  network 
analysis  over  logistic  regression.  Although  logistic  regression  uses  a  sigmoid  curve 
(rather  than  a  straight  line)  to  represent  the  strength  of  the  relationship  between 
independent  and  dichotomous  dependent  variables  (41),  many  of  the  arguments 
articulated  by  Tu  are  still  applicable  to  a  comparison  of  neural  network  models  and  linear 
regression  models.  Because  neural  networks  require  less  formal  statistical  knowledge 
than  linear  regression,  they  may  be  easier  to  understand  from  a  conceptual  standpoint. 
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Neural  networks  also  are  capable  of  implicit  detection  of  complex  non-linear 
relationships  among  variables  in  the  model  and  able  to  determine  interactions 
automatically.  Additionally,  neural  networks  are  adaptable  to  various  and  multiple 
training  algorithms  (i.e.,  activation  functions),  whereas  linear  regression  models  are 
limited  to  one  linear  regression  equation.  Perhaps  the  most  pronounced  benefit  of  neural 
network  analysis,  specifically  the  backpropagation  method,  is  that  it  is  adaptive.  Neural 
networks  have  the  ability  to  minimize  the  error  variance  in  the  model  by  using  an 
iterative  training  process.  This  method  increases  the  accuracy  of  the  final  model  and 
cannot  be  duplicated  in  linear  regression.  However,  one  drawback  to  this  approach  is  that 
multiple  iterations  may  be  prone  to  overfitting  the  model  to  the  specific  dataset  which  can 
reduce  the  generalizability  of  the  results  (1 15).  Stopping  rules  may  be  applied  (either 
manually  or  automatically)  to  minimize  the  potential  for  overfitting  (60).  Early  stopping 
is  a  technique  in  which  the  researcher  stops  the  network  early  using  an  ad  hoc 
determination  point;  however,  research  suggests  that  early  stopping  may  result  in  a  less 
accurate  final  model  (91).  As  a  result,  other  stopping  rules  may  be  used  such  as  setting  a 
maximum  training  time  for  the  model  or  setting  the  model  to  train  toward  a  specified 
convergence  so  that  training  iterations  stop  once  there  is  little  added  accuracy  from  the 
training  (60).  The  stopping  rule  used  can  be  determined  in  advance  by  the  researcher,  or 
may  be  determined  by  the  software  package  used.  In  the  case  of  the  SPSS  Neural 
Network  program,  if  the  stopping  rule  is  automatically  determined  by  the  software,  then 
data  must  go  through  one  complete  round  of  analysis  by  the  model  before  stopping  rules 
are  applied  in  a  hierarchical  fashion:  maximum  steps  without  a  decrease  in  model  error, 
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maximum  training  time,  maximum  training  epochs  (data  passes),  minimum  relative 
change  in  training  error,  and  minimum  relative  change  in  training  error  ratio  (60). 

The  many  benefits  of  neural  network  models  suggest  they  may  be  able  to  improve 
on  the  predictive  performance  of  linear  regression  models.  Specifically,  the  study 
rationale  that  follows  underscores  why  the  neural  network  analysis  is  likely  to  outperform 
linear  regression  modeling  in  predicting  breast  cancer  survivors’  post-treatment  mental 
and  physical  function. 

Study  Rationale 

Studies  of  cancer  survivors  which  have  used  measures  of  mental  and  physical 
functioning  as  outcome  measures  (such  as  the  Mental  Component  Summary  (MCS)  and 
the  Physical  Component  Summary  (PCS)  derived  from  the  SF-36)  have  examined  the 
contribution  of  a  wide  variety  of  factors  in  statistically  predicting  mental  and  physical 
functioning  (Appendix  1).  With  regard  to  mental  functioning,  factors  in  these  studies  that 
demonstrated  a  statistically  significant  association  included:  age,  race,  partner  status, 
income,  occupation,  comorbidity/number  of  chronic  conditions,  cognitive  status, 
social/emotional  status,  fear  of  recurrence,  fatigue,  insomnia,  and  changes  in  emotional 
support.  In  these  studies,  physical  functioning  showed  a  statistically  significant 
relationship  with  factors  such  as:  age,  race,  employment,  income,  occupation,  children 
younger  than  18  living  in  the  home,  stage  at  diagnosis,  time  since  diagnosis,  treatment 
received,  menopausal  status,  comorbidity/number  of  chronic  conditions,  dizziness, 
urinary  incontinence,  social/emotional  status,  caregiving/financial  status,  exercise  and 
diet,  fear  of  recurrence,  fatigue,  and  current  lymphedema.  These  studies  typically  used 
linear  statistical  approaches.  However,  there  is  little  consistency  across  studies  regarding 
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the  contribution  of  these  factors  or  which  factors  are  most  important  to  predict  the  mental 
and  physical  functioning  of  cancer  survivors.  Studies  using  linear  regression  report  that 
the  variance  accounted  for  in  the  SF-36  component  summary  scores  by  linear  statistical 
models  ranges  from  36  -  41%  of  the  variance  in  mental  functioning  and  38  -  46%  of  the 
variance  in  physical  functioning  (16;  17).  These  ranges  suggest  that  contributing  factors 
for  about  half  of  the  variance  in  mental  and  physical  function  remain  unknown.  These 
findings  may  be  explained  in  part  by  the  linear  nature  of  the  models.  Specifically,  the 
diverse  findings  represented  in  Appendix  1  suggest  that  the  relationships  among  these 
factors  may  be  more  complex  than  those  relationships  that  can  be  obtained  through 
traditional  linear  approaches.  This  complexity  suggests  that  these  variables  may  in  fact 
be  non-linearly  associated  and,  therefore,  unable  to  be  fully  accounted  for  a  by  a  linear 
statistical  model.  A  non-linear  model  may  allow  for  a  more  complete  description  of 
the  complex  associations  among  the  independent  variables  and  the  outcomes  of 
interest.  Moreover,  a  non-linear  model  that  learns  and  adapts  to  the  factors  used  to 
construct  the  model  has  the  capacity  to  be  more  sensitive  by  reducing  overall  error  in  the 
model  and  increasing  predictive  accuracy. 

Although  linear  approaches  allow  the  specification  of  interaction  terms  to  detect 
interrelationships  among  independent  variables,  such  interaction  terms  become  difficult 
to  interpret  when  they  include  more  than  two  variables.  When  linear  approaches  are  used 
with  factors  that  truly  have  non-linear  relationships,  the  linear  model  will  underestimate 
the  true  relationships  between  the  predictor  and  outcome  variables,  increasing  the 
potential  for  Type  II  error  (87).  Additionally,  when  a  large  number  of  predictor  variables 
are  included  in  a  linear  model,  a  common  practice  when  studying  more  complex  health 
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outcomes  and  typically  seen  with  multivariate  regression,  this  approach  can  increase  the 
likelihood  of  a  Type  I  error  (87). 

Linear  regression  is  a  commonly  used  approach  in  statistical  prediction  models. 
Linear  regression  models  are  useful  and  easy  to  interpret;  however,  they  demonstrate 
several  disadvantages  when  compared  to  neural  network  analysis.  Linearity  between  the 
predictor  and  outcome  variables  is  assumed  with  this  approach  because  a  linear  function 
is  used  to  determine  each  independent  variable’s  unique  contribution  to  the  outcome 
measure.  However,  the  inflexibility  of  a  linear  approach  diminishes  a  linear  model’s 
ability  to  accurately  predict  complex  interrelationships  among  variables,  making  this 
approach  unsuitable  if  the  relationship  between  the  predictor  and  outcome  variables  is  not 
truly  linear.  Additionally,  unlike  neural  network  models,  linear  regression  is  not 
adaptive.  Error  is  defined  as  the  difference  between  the  value  predicted  by  the  model 
and  the  true  outcome  value.  However,  linear  regression  does  not  adjust  the  original  input 
variable  weights  after  identifying  the  amount  of  error  in  the  model.  As  such,  there  may 
be  a  more  useful  statistical  approach  (other  than  a  linear  regression  model)  to  determine 
the  associations  among  demographic,  clinical,  and  medical  variables  and  mental  and 
physical  functioning.  A  predictive  model  that  learns,  or  adapts,  to  the  characteristics  of 
the  variables  used  to  build  the  model  might  be  more  adept  at  reducing  the  overall  error  in 
the  model  and  increasing  the  predictive  accuracy.  Such  a  model  may  be  more  sensitive, 
providing  a  more  accurate  picture  of  the  relationship  among  predictive  factors  and  the 
outcome  of  interest,  and  possibly  improving  the  ability  to  identify  specific  areas  for 
intervention.  Neural  network  analysis  is  one  type  of  an  adaptive,  learning  model. 
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Neural  network  models  are  non-linear,  adaptive  statistical  approaches  to 
understanding  complex  relationships  among  variables.  Such  models  may  be 
especially  useful  in  this  application  given  that  less  than  half  of  the  variance  in  mental  and 
physical  functioning  is  explained  by  linear  regression  models  (16;  17).  The  predictive 
plane  for  a  neural  network  can  range  on  a  continuum  of  non-linearity  from  slightly  to 
highly  non-linear.  The  hidden  layer  of  the  neural  network  includes  hidden  nodes,  each 
with  an  estimation  algorithm  (typically,  sigmoid  curves).  Multiple  hidden  nodes  in  the 
hidden  layer  of  the  model  indicate  multiple  estimation  algorithms  (e.g.,  sigmoid  curves). 
The  use  of  multiple  sigmoid  curves  in  a  neural  network  produces  a  non-linear  function 
which,  unlike  linear  regression,  creates  a  non-linear  predictive  line  that  can  assume 
varying  lengths  and  degrees  of  rotation.  This  non-linear  prediction  line  can  account  for 
non-linear  relationships  and  multiple  combinations  of  variables,  thus  reducing  error  and 
producing  a  more  accurate  predictive  model.  More  hidden  nodes  in  the  network  equate 
to  more  non-linear  estimation  algorithms  (e.g.,  sigmoid  curves),  which  in  turn  increases 
the  non-linearity  of  the  predictive  estimation  line.  A  conceptual  comparison  of  the  linear 
predictive  line  of  linear  regression  and  the  non-linear  predictive  line  of  a  neural  network 
is  presented  in  Figure  3.  This  conceptual  comparison  demonstrates  the  potential  of  a 
non-linear  model  to  reduce  overall  predictive  error  by  providing  flexibility  in  the 
predictive  line. 
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Figure  3.  Comparison  of  Linear  and  Non-linear  Predictive  Lines. 

The  dashed  line  indicates  the  linear  predictive  line  of  a  multiple  linear  regression.  The 
solid  line  represents  the  non-linear  predictive  line  of  a  neural  network  model  for  the  same 
data  points. 
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Neural  networks  are  also  adaptive  to  the  inputs  from  the  model.  For  example, 
back  propagation  neural  networks  are  able  to  self-analyze  their  performance  with  regard 
to  the  amount  of  error  in  the  model’s  prediction  and  then  adjust  that  prediction  to 
minimize  model  error.  If  error  between  the  predicted  classification  and  the  actual 
classification  is  identified,  then  the  neural  network  feeds  this  information  back  through 
the  model  to  adjust  the  original  input  weights  of  the  predictor  variables.  Once  the 
weights  are  adjusted,  the  neural  network  runs  the  model  again  and  assesses  for  error. 

This  process  repeats  until  the  error  in  the  model  is  satisfactorily  minimized  (i.e.,  as  long 
as  decreases  in  error  are  evident).  In  this  manner,  the  neural  network  adapts  to  the 
information  it  is  provided,  suggesting  a  more  accurate  statistical  approach  than  standard 
linear  regression  can  offer. 

Despite  the  benefits  of  neural  network  analysis,  few  researchers  examining  cancer 
survivorship  have  used  a  neural  network  approach,  with  the  exception  of  those 
researchers  studying  diagnostic  tests  for  cancer  or  mortality  from  cancer.  In  fact,  a 
literature  search  revealed  only  one  study  of  cancer  survivors  which  used  a  neural  network 
model  to  evaluate  post-surgical  function  and  quality  of  life  in  breast  cancer  survivors 
(114).  In  this  study,  the  authors  compared  the  performance  of  two  neural  network  models 
to  that  of  a  multiple  linear  regression  model  in  predicting  mental  and  physical  functioning 
following  breast  cancer  surgery.  The  neural  network  models  generally  demonstrated 
smaller  mean  square  errors  and  greater  predictive  accuracy  than  the  regression  model. 

The  current  study  aimed  to  examine  the  differential  predictive  utility  of  each  of 
these  models  (linear  regression  and  neural  network  analysis)  with  regard  to  health  status 
in  a  post-treatment  sample  of  breast  cancer  survivors  using  demographic,  medical,  and 
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clinical  predictor  variables.  Because  the  relationships  among  these  predictor  variables 
and  health  status  outcomes  appear  to  be  highly  complex  and  likely  non-linear,  a  non¬ 
linear,  adaptive  statistical  approach  is  warranted.  A  highly  accurate  non-linear  statistical 
model,  such  as  neural  network  analysis,  has  the  potential  to  clarify  these  relationships  and 
broaden  knowledge  of  post-treatment  mental  and  physical  functioning  by  reducing 
predictive  error  or  identifying  different  relationships  among  variables. 
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CHAPTER  2:  Specific  Aims  and  Hypotheses 


This  section  presents  the  specific  aims  and  hypotheses  associated  with  the  present  study. 

Specific  Aim  1 

To  determine  which  statistical  model  would  produce  the  best  model  fit  (as  defined  by  the 
model  with  the  smallest  mean  square  error  [MSE])  in  statistically  predicting  mental  and  physical 
functioning  in  breast  cancer  survivors. 

Hypothesis  1.1:  It  was  hypothesized  that  a  neural  network  model  would  demonstrate  a 
lower  MSE  than  linear  regression  in  statistically  predicting  mental  functioning  in  breast  cancer 
survivors. 

Hypothesis  1.2:  It  was  hypothesized  that  a  neural  network  model  would  demonstrate  a 
lower  MSE  than  linear  regression  in  statistically  predicting  physical  functioning  in  breast  cancer 
survivors. 

Rationale  for  Hypotheses  1.1  and  1.2:  Unlike  the  linear  predictive  line  of  a  linear 
regression,  the  non-linear  predictive  line  produced  by  a  neural  network  allows  the  statistical 
model  to  more  closely  predict  the  values  of  the  target  (dependent)  variables  (see  Figure  3).  This 
non-linear  predictive  line  becomes  especially  useful  when  dealing  with  complex 
interrelationships  among  variables  (115).  Furthermore,  a  neural  network  is  adaptive  because  it 
undergoes  a  repetitive,  iterative  learning  process  designed  to  teach  the  model  about  patterns  in 
the  data  and  decrease  the  overall  error  in  the  model  (48;  1 15).  Decreased  error  in  the  neural 
network  predictions  improves  the  overall  fit  of  the  model.  Linear  regression  is  limited  in  this 
regard  because  it  does  not  use  an  iterative  process  to  improve  model  performance. 
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Specific  Aim  2 


To  determine  which  statistical  model  would  demonstrate  greater  statistically  predictive 
accuracy  (as  defined  by  the  model  with  the  smallest  mean  absolute  percentage  error  [MAPE])  in 
statistically  predicting  mental  and  physical  functioning  in  breast  cancer  survivors. 

Hypothesis  2.1:  It  was  hypothesized  that  a  neural  network  model  would  demonstrate  a 
lower  MAPE  than  linear  regression  in  statistically  predicting  mental  functioning  in  breast 
cancer  survivors. 

Hypothesis  2.2:  It  was  hypothesized  that  a  neural  network  model  would  demonstrate  a 
lower  MAPE  than  linear  regression  in  statistically  predicting  physical  functioning  in  breast 
cancer  survivors. 

Rationale  for  Hypotheses  2.1  and  2.2:  Unlike  the  linear  predictive  line  of  a  linear 
regression,  the  non-linear  predictive  line  produced  by  a  neural  network  allows  the  statistical 
model  to  more  closely  predict  the  values  of  the  target  (dependent)  variables  (see  Figure  3).  This 
non-linear  predictive  line  becomes  especially  useful  when  dealing  with  complex 
interrelationships  among  variables  (115).  Furthermore,  a  neural  network  is  adaptive  because  it 
undergoes  a  repetitive,  iterative  learning  process  designed  to  teach  the  model  about  patterns  in 
the  data  and  decrease  the  overall  error  in  the  model  (48;  1 15).  Decreased  error  in  the  neural 
network  predictions  improves  the  overall  fit  of  the  model.  Linear  regression  is  limited  in  this 
regard  because  it  does  not  use  an  iterative  process  to  improve  model  performance. 

Specific  Aim  3 

To  determine  which  statistical  model  would  account  for  the  greatest  independent 
variable  sensitivity  (as  determined  by  global  sensitivity  analysis)  in  statistically  predicting 
mental  and  physical  functioning  in  breast  cancer  survivors. 
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Hypothesis  3.1:  It  was  hypothesized  that  the  full  set  of  independent  variables  (Table  1) 
in  the  neural  network  model  would  account  for  a  greater  global  sensitivity  ratio  than  the  same  set 
of  independent  variables  (Table  1)  in  the  linear  regression  model  with  regard  to  mental 
functioning  in  breast  cancer  survivors. 

Hypothesis  3.2:  It  was  hypothesized  that  the  full  set  of  independent  variables  (Table  1) 
in  the  neural  network  model  would  account  for  a  greater  global  sensitivity  ratio  than  the  same  set 
of  independent  variables  (Table  1)  in  the  linear  regression  model  with  regard  to  physical 
functioning  in  breast  cancer  survivors. 

Rationale  for  Hypotheses  3.1  and  3.2:  Studies  using  linear  statistical  models  to  predict 
mental  and  physical  function  cancer  survivors  report  that  these  linear  models  account  for  less 
than  half  the  variance  in  each  of  these  measures  (16;  17;  113).  These  ranges  suggest  contributing 
factors  for  over  half  of  the  variance  in  these  measures  remains  unknown.  However,  these 
findings  may  be  explained  in  part  by  the  linear  nature  of  the  models  which  may  not  be  able  to 
account  for  the  complex  relationships  among  the  variables  studied.  Additionally,  when  linear 
approaches  are  used  with  factors  that  truly  have  non-linear  relationships,  the  linear  model  will 
underestimate  the  true  relationships  between  the  predictor  and  outcome  variables  (87).  A  non¬ 
linear  model,  such  as  a  neural  network,  may  allow  for  a  more  complete  description  of  the 
associations  among  the  independent  variables  and  the  outcomes  of  interest. 
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Table  1 .  List  of  Independent  Variables 


Demographic 

Medical 

Clinical 

(CSPro  Domains) 

Age 

Stage  at  Diagnosis 

Symptom  Burden 

Race 

Time  since  Diagnosis 

Function 

Education 

Treatment  Received 

Health  Behavior 

Partner  Status 

Adjuvant  Treatment 

Health  Service  Needs 

Employment 

Time  since  Treatment 

Income 

Menopausal  Status 
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CHAPTER  3:  Method 


Participants 

The  present  study  was  conducted  using  data  from  a  validation  study  of  the  CSPro  which 
consisted  of  400  female  breast  cancer  survivors  (112).  The  study  conducted  by  Todd  and 
colleagues  (112)  was  designed  to  establish  the  reliability  and  validity  of  the  retained  items  in  the 
final  CSPro  measure.  Inclusion  criteria  for  all  participants  in  the  original  study  included  breast 
cancer  survivors  who  self-reported  female  gender,  were  diagnosed  with  breast  cancer  stages  I-III, 
had  completed  primary  anticancer  treatment  no  more  than  five  years  prior  to  the  study,  had  no 
history  of  previous  cancer  or  current  second  cancer,  were  aged  21  or  older,  and  had  access  to  the 
Internet. 

Cancer  survivors  with  a  history  of  a  previous  or  current  second  cancer  were  excluded 
from  the  study  because  survivors  with  a  history  of  multiple  cancers  report  poorer  general  health 
and  psychosocial  outcomes  than  cancer  survivors  with  a  history  of  a  single  primary  cancer  (1 10). 
Cancer  survivors  with  a  stage  0  diagnosis  were  excluded  from  the  proposed  study  because  these 
individuals  tend  to  undergo  less  invasive  treatments  and  therefore  may  have  a  different  symptom 
burden  than  those  survivors  diagnosed  with  stage  I-III  cancers.  Similarly,  survivors  with  a  stage 
IV  diagnosis  or  metastatic  cancer  diagnosis  were  excluded  from  the  study  because  their 
treatment  regimen  is  typically  more  invasive  and  intense  than  that  of  cancer  survivors  with  a 
stage  I-III  diagnosis  (101)  and  such  intense  treatment  may  be  associated  with  different 
survivorship  outcomes. 

Recruitment 

The  original  study  (112)  recruited  participants  through  advertisements  and  leaflets 
disseminated  to  comprehensive  cancer  care  centers,  primary  care  clinics,  support  groups,  hospital 
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bulletin  boards,  newspapers,  and  websites  across  the  United  States.  Online  surveys  were  used  to 
collect  data  from  participants.  After  answering  web-based  screening  materials,  eligible 
participants  were  then  directed  to  a  main  website  to  provide  informed  consent  and  the  study 
measures  using  an  Internet-based  platform. 

Measures 

Findings  from  cancer  survivor  research  (15;  54)  as  well  as  experience  with  previous 
cancer  survivorship  studies  (19;  22;  53;  112)  informed  the  selection  of  the  following  measures 
from  the  original  CSPro  validation  study  for  analysis  in  the  current  study  (112). 

Demographic  and  Medical  Measures 

Participants  completed  questions  regarding  demographic  and  medical  information  using 
questions  that  our  research  group  has  used  in  three  independent  Internet  surveys  (19;  21;  53). 
Questions  are  listed  in  Appendix  2.  Demographics  consisted  of  age,  race,  education,  partner 
status,  occupational  status,  and  socioeconomic  status.  Medical  questions  included  stage  of  tumor 
at  diagnosis,  time  since  diagnosis,  treatment  received  (i.e.,  surgery,  radiation,  chemotherapy), 
adjuvant  therapies  received,  time  since  completion  of  primary  treatment,  and  menopausal  status. 

Cancer  Survivor  Profile  (CSPro) 

Participants  also  completed  the  Cancer  Survivor  Profile  (CSPro)  (Appendix  3).  The 
CSPro  is  a  screening  measure  of  problems  experienced  by  cancer  survivors  (112)  which  provides 
a  profile  of  patient-reported  problems.  The  CSPro  was  originally  developed  using  a  female 
breast  cancer  survivor  population  diagnosed  with  stages  I-III,  who  had  completed  primary 
anticancer  treatment  no  more  than  five  years  prior  to  the  study,  had  no  history  of  previous  cancer 
or  current  second  cancer,  and  were  aged  21  or  older.  The  profile  is  a  107-item  questionnaire 
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designed  to  be  administered  in  a  clinical  setting  with  the  goal  of  measuring  problems  in  four 
specific  domains:  symptom  burden  (anxiety,  fear  of  recurrence,  body  image,  pain,  fatigue, 
depression),  function  (social  support,  work,  sleep,  cognitive  function,  sexual  function),  health 
behaviors  (physical  activity/exercise,  diet),  and  health  service  needs  (health  competence,  patient- 
provider  communication,  economic  barriers,  health  information).  CSPro  scores  are  provided  as 
standardized  t-scores  which  have  a  mean  of  50  and  a  standard  deviation  of  10.  Confidence 
intervals  (95%)  for  each  score  also  are  provided. 

The  psychometric  properties  of  the  CSPro  have  been  evaluated  by  a  principal  component 
analysis  (a  type  of  factor  analysis)  to  provide  factor  loadings  for  the  items  within  the  four  CSPro 
domains.  Factor  loadings  indicate  how  well  a  particular  item  in  a  measure  correlates  with  the 
construct  on  which  the  item  loads  (90).  Factor  loadings  are  generally  considered  weak  if  less 
than  .4,  moderate  if  .4  to  .6,  and  strong  if  above  .6  (49).  Preliminary  psychometric  data  for  the 
CSPro  items  show  good  factor  validity.  Six  constructs  of  the  symptom  burden  domain  have 
item  factor  loadings  ranging  from  .51  to  .92.  The  function  domain  consists  of  five  constructs 
with  item  factor  loadings  in  the  range  of  .62  to  .94.  The  two  constructs  on  the  health  behaviors 
domain  had  item  factor  loadings  ranging  from  .58  to  .82.  The  health  service  needs  domain  is 
comprised  of  four  constructs  with  item  factor  loadings  ranging  from  .70  to  .92.  Three  of  the 
original  constructs  (alcohol  consumption,  cigarette  smoking,  and  fertility  distress)  could  not  be 
included  in  the  preliminary  factor  analysis  because  these  constructs  were  not  applicable  to  all 
respondents. 
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Outcome  Measures 


The  following  outcome  measures  were  administered  to  all  participants  to  yield  two 
separate  indices  of  mental  functioning  and  physical  functioning. 

Center  for  Epidemiological  Studies  -  Depression  Scale  (CES-D) 

The  Center  for  Epidemiological  Studies  -  Depression  Scale  (CES-D)  (Appendix  4)  was 
selected  as  the  dependent  variable  measuring  mental  functioning  in  this  study.  The  CES-D  is  a 
20-item  self-report  measure  of  affective  depressive  symptoms  (93).  The  CES-D  was  selected  as 
an  outcome  measure  in  this  study  because  it  is  considered  a  gold-standard  measure  of  depression 
in  research,  has  been  validated  in  cancer  populations,  and  demonstrates  acceptable  psychometric 
properties  (1 17).  The  CES-D  has  also  been  used  extensively  with  cancer  survivor  populations 
(40;  57;  1 13).  Similar  to  studies  accounting  for  less  than  half  the  variance  in  mental  functioning 
as  measured  by  the  Short  Form-36  (SF-36)  (16;  17),  linear  statistical  modeling  in  a  study  of 
health  and  well-being  in  female  cancer  survivors  accounted  for  48%  of  the  variance  in  the  Center 
for  Epidemiological  Studies  -  Depression  Scale  (CES-D)  when  using  various  health  and 
demographic  variables  as  predictors  (113). 

Behavioral  Risk  Factor  Surveillance  System  (BRFSS)  -  Physical  Activity 

The  Behavioral  Risk  Factor  Surveillance  System  (BRFSS)  -  Physical  Activity  scale 
(Appendix  5)  was  chosen  as  the  dependent  variable  measuring  physical  functioning  in  this  study. 
The  BRFSS  consists  of  seven  items  inquiring  about  participant  physical  activity  over  the  last 
month  (27).  This  scale  yields  a  metric  of  physical  activity  that  can  be  converted  into  a  metabolic 
equivalent  of  task  (MET)  which  is  a  standardized  measure  of  energy  expenditure  (26).  The 
BRFSS  was  chosen  as  an  outcome  measure  in  this  study  because  it  has  been  used  with  cancer 
survivor  populations  (69;  85)  and  demonstrates  acceptable  reliability  and  validity  (86;  123). 
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Regarding  physical  function  as  measured  by  the  Behavioral  Risk  Factor  Surveillance  System 
(BRFSS)  -  Physical  Activity,  the  percentage  accounted  for  by  linear  regression  models  is  not 
clear;  however,  logistic  regression  models  demonstrate  an  association  between  certain  predictor 
variables  and  physical  activity  (as  measured  by  the  BRFSS)  in  cancer  survivors  (69). 
Furthermore,  the  paucity  of  research  examining  the  variance  in  the  BRFSS  under  various 
predictive  statistical  models  suggests  an  additional  need  for  the  present  study. 

Analytic  Plan 

Descriptive  analyses  were  conducted  to  evaluate  the  characteristics  of  the  data  gathered. 
Categorical  variables  were  analyzed  for  response  frequencies  and  any  missing  data.  Continuous 
variables  were  evaluated  for  summary  statistics  (i.e.,  mean,  median,  and  standard  deviation),  data 
distribution,  and  missing  data. 

All  measures  were  evaluated  for  response  frequencies  and  patterns  of  missing  data.  To 
determine  the  pattern  of  missing  data,  independent  /-tests  were  computed  for  all  continuous 
variables  and  chi-square  analyses  were  conducted  on  all  categorical  variables  to  compare  any 
statistically  significant  differences  between  complete  records  and  incomplete  records  (45). 

Developing  the  Predictive  Models 

Following  an  approach  used  by  Comrie  (32),  the  analysis  in  the  present  study  called  for 
keeping  both  the  linear  regression  and  neural  network  models  as  simple  as  possible  to  facilitate 
comparison  of  the  results.  While  both  models  can  be  adapted  to  develop  quite  sophisticated 
predictive  architectures,  more  complexity  in  the  models  increases  the  model  differences  thereby 
reducing  the  interpretability  of  the  comparison.  Constructing  the  models  in  their  most  basic, 
straightforward  architecture  allows  for  a  more  direct  comparison  and  suggests  that  any 
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differences  observed  likely  resulted  from  the  basic  predictive  model,  rather  than  specific 
adjustments  to  each  architecture. 

Using  SPSS  20,  two  statistical  models  (linear  regression  and  neural  network)  were 
constructed  for  comparison  to  predict  depression  ratings,  as  measured  by  the  CES-D,  and 
physical  activity  ratings,  as  measured  by  the  BRFSS.  The  independent  variables  in  both  models 
were  identical.  Demographic  independent  variables  were  age,  race,  education,  relationship  status 
(partnered),  employment,  and  income.  Medical  independent  variables  included  tumor  stage  at 
diagnosis,  number  of  years  post-diagnosis,  primary  anticancer  treatment  type,  adjuvant 
treatment,  number  of  years  post  treatment,  and  menopausal  status.  The  following  domains  of  the 
CSPro  represented  clinical  independent  variables:  symptom  burden,  function,  health  behavior, 
and  health  service. 

Linear  Regression  Models 

Standard  linear  regression  was  conducted  for  each  outcome  variable  (depression  and 
physical  activity)  to  determine  the  contribution  of  the  independent  variables  in  predicting  each 
outcome  variable  in  the  study.  Because  linear  regression  treats  each  predictor  variable  as  a 
covariate  (i.e.,  the  results  show  the  unique  contribution  of  each  independent  variable  when  the 
other  independent  variables  are  controlled),  there  is  no  need  to  identify  separate  covariate  terms 
for  this  analysis. 

To  evaluate  depressive  symptoms,  all  independent  variables  were  entered  into  the 
regression  model  simultaneously  to  determine  each  variable’s  unique  contribution  in  predicting 
the  CES-D  scores.  Higher  CES-D  scores  indicate  more  depressive  symptoms.  Physical  activity 
was  also  analyzed  through  a  linear  regression  with  each  independent  variable  simultaneously 
entered  into  the  model  to  determine  the  unique  contribution  in  predicting  the  physical  activity 
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scores  of  the  BRFSS.  In  this  case,  higher  scores  on  the  BRFSS  represent  higher  levels  of 
physical  activity. 

Neural  Network  Models 

A  neural  network  model  was  also  constructed  using  the  same  independent  variables  and 
dependent  variables  as  those  used  in  the  linear  regression  model.  The  neural  network  model 
used  was  a  multilayer  perceptron  architecture  which  is  a  back  propagation  model.  The  term 
perceptron  was  coined  in  the  1960s  by  Frank  Rosenblatt  and  colleagues  to  describe  a  specific 
neural  network  architecture  that  used  an  iterative  approach  to  learning  and  reducing  predictive 
error  by  feeding  information  forward  to  the  next  layer  in  the  model  (77).  This  term  is  adapted 
from  the  word  perception  because  the  researchers  who  developed  this  approach  believed  it 
closely  resembled  how  the  brain  processes  sensory  information  (77).  In  a  practical  sense, 
perceptron  describes  the  learning  algorithm  used  to  calculate  and  correct  synaptic  weights  with 
the  goal  of  reducing  model  error  (48;  77).  This  specific  architecture  uses  a  feed  forward  network 
such  that  the  effects  of  the  input  layer  are  sent  forward  (or  fed)  to  the  hidden  layer,  which  are 
then  sent  forward  (or  fed)  into  the  output  layer  (48).  The  hidden  layer  uses  an  algorithm  to  sum 
the  weights  of  the  inputs  from  the  input  layer  (predictor  variables)  before  sending  this 
information  to  the  output  layer  (48).  As  discussed  above,  the  input  layer  consists  of  the 
independent  (or  predictor)  variables  and  the  output  layer  consists  of  the  dependent  (or  outcome) 
variables.  The  differences  between  the  predicted  values  of  the  dependent  variables  and  the 
actual  (or  target)  values  of  the  dependent  variables  are  then  computed  as  errors  that  are  back 
propagated  through  the  model  to  adjust  the  weights  of  the  connections  between  the  layers  in 
order  to  minimize  overall  model  error  (48). 
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The  neural  network  model  was  constructed  to  include  one  hidden  layer  and  set  to 
automatically  select  the  optimal  number  of  nodes  for  the  hidden  layer  using  an  estimation 
algorithm  (60).  See  Table  2  for  the  specific  settings  used  in  the  neural  network  model  for  this 
project.  An  activation  function  is  the  learning  rule  that  sums  the  values  of  the  inputs  from  the 
previous  layer  and  applies  an  algorithm  to  these  values  before  sending  them  to  the  next  layer 
(48).  In  this  model,  the  activation  function  for  all  nodes  in  the  hidden  layer  was  set  to  a  sigmoid 
function  (60).  Use  of  a  sigmoid  function  allows  non-linearity  to  be  introduced  into  the  neural 
network  model  (48).  For  nodes  in  the  output  layer  (prediction  nodes),  the  identity  activation 
function  was  applied.  The  identity  function  is  appropriate  for  use  with  continuous  dependent 
variables  because  it  retains  the  scale  of  the  variable  for  the  prediction  so  that  it  is  comparable  to 
the  actual  values  of  the  target  variable  (which  is  also  continuous). 

A  batch  training  approach  was  used  so  that  adjustment  of  the  connection  weights  would 
be  calculated  after  all  cases  were  simultaneously  entered  into  the  model  (60).  Batch  training 
updates  the  synaptic  weights  and  calculates  error  in  the  model  only  after  all  the  information  in 
the  dataset  has  been  reviewed  (gone  through  one  complete  pass)  (60).  Although  a  train-test 
approach  was  attempted  with  the  model,  this  approach  could  not  be  successfully  implemented 
because  one  or  more  cases  in  the  testing  sample  contained  variable  values  that  did  not  occur  in 
the  training  sample  which  would  have  excluded  those  cases  from  the  final  analysis  (60).  In  order 
to  implement  supervised  learning,  a  scaled  conjugate  gradient  (SCG)  was  applied  to  determine 
the  weights  of  the  connections  in  the  model.  SCG  uses  a  step-size,  or  scaled  approach  to 
estimate  the  initial  weights  of  the  connections  between  the  layers  (74).  In  this  regard,  SCG 
reduces  the  training  time  required  because  the  initial  weights  assigned  to  the  model  by  the  SCG 
are  designed  to  produce  smaller  gradients  between  the  predicted  values  (output  layer)  and  the 
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actual  (or  target)  values  of  the  dependent  variable  at  the  outset  of  training  (74).  To  avoid 
overfitting  the  data,  the  program  was  set  to  automatically  detect  convergence  of  the  model  so  that 
training  iterations  stopped  once  the  model  experienced  no  added  accuracy  from  additional 
trainings  (60).  The  relative  change  in  training  error  criterion  (0.0001)  was  achieved  for  each 
neural  network  models  predicting  the  outcomes  of  interest. 

Following  the  approach  used  to  construct  the  linear  regression  models,  the  neural 
network  analysis  evaluated  each  outcome  or  dependent  variable  (depressive  symptoms  from  the 
CES-D  and  physical  activity  ratings  on  the  BRFSS)  using  the  same  independent  variables 
applied  in  the  linear  regression  analysis.  These  variables  were  entered  into  the  input  layer 
simultaneously  (rather  than  using  a  stepwise  approach).  Therefore,  depressive  symptoms  were 
predicted  from  the  neural  network  model  with  all  independent  variables  concurrently  entered  into 
the  input  layer  of  the  model.  A  separate  model  was  built  to  predict  physical  activity  from  the 
neural  network  model  with  all  independent  variables  simultaneously  entered  into  the  model. 

A  specific  number  seed  was  set  and  recorded  prior  to  running  the  neural  network  model 
so  that  results  could  be  replicated.  Additionally,  the  order  of  the  cases  and  the  order  of  the 
independent  variables  were  kept  constant  (and  identical  to  the  order  used  in  the  linear  regression 
model).  These  procedures  are  not  required  in  linear  regression  models.  However,  the  adaptive, 
iterative  nature  of  a  neural  network  model  is  highly  sensitive  to  the  initial  weights  assigned  to  the 
inputs  and  the  ordering  of  cases  and  variables;  therefore,  to  replicate  study  results  exactly,  it  is 
essential  to  identify  these  elements  at  the  outset. 
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Table  2.  Settings  used  in  the  Neural  Network  Models 


Rescaling  Method  for  Covariates 

None 

Number  of  Hidden  Layers 

One 

Number  of  Nodes  in  Hidden  Layer 

Automatically  compute 

Activation  Function 

Sigmoid 

Output  Layer  Activation  Function 

Identity 

Rescaling  Method  for  Scale  Dependents 

None 

Type  of  Training  Method 

Batch 

Optimization  Algorithm 

Scaled  Conjugate  Gradient 

Stopping  Rules 

Max  steps  without  a  decrease  in  error  =  1 

Minimum  Relative  Change  in  Training  Error 

0.0001 

Minimum  Relative  Change  in  Training  Error  Ratio 

0.001 

Data  Analysis  for  Specific  Aims 

Specific  Aim  1:  To  determine  which  statistical  model  would  produce  the  best  model  fit 
(as  defined  by  the  model  with  the  smallest  mean  square  error  [MSE])  in  statistically  predicting 
mental  and  physical  functioning  in  breast  cancer  survivors. 

The  goodness  of  fit  for  each  statistical  model  (linear  regression  and  neural  network)  in 
predicting  each  of  the  two  outcome  variables  (mental  functioning  as  measured  by  the  CES-D  and 
physical  activity  as  measured  by  the  BRFSS)  were  compared  by  calculating  each  model’s  MSE. 
MSE  has  been  used  in  previous  research  as  a  goodness  of  fit  measure  in  comparing  models  of 
linear  regression  and  neural  network  analysis  (99;  1 14).  MSE  is  calculated  by  computing  the 
difference  of  the  predicted  values  from  the  model  and  the  actual  values  of  the  dependent 
variable,  and  then  averaging  these  differences  across  the  all  data  (99;  1 14).  The  lower  the 
model’s  MSE,  the  less  error  in  the  overall  model,  and  the  better  the  model  fit.  The  formula  for 
MSE  is  as  follows  (99): 
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MSE  =  -V  (Y,-Y,)2 

"h ; 

where  n  represents  the  number  of  cases,  Y,  indicates  the  actual  (or  target)  value  of  the  ith 
observation,  and  Yi  represents  the  predicted  value  of  the  ith  observation  provided  by  the  model. 

Specific  Aim  2:  To  determine  which  statistical  model  would  demonstrate  greater 
statistically  predictive  accuracy  (as  defined  by  the  model  with  the  smallest  mean  absolute 
percentage  error  [MAPE])  in  statistically  predicting  mental  and  physical  functioning  in  breast 
cancer  survivors. 

The  predictive  accuracy  for  each  statistical  model  (linear  regression  and  neural  network) 
in  predicting  each  of  the  two  outcome  variables  (mental  functioning  as  measured  by  the  CES-D 
and  physical  activity  as  measured  by  the  BRFSS)  were  compared  by  calculating  the  MAPE  for 
each  model.  MAPE  has  been  used  in  previous  research  comparing  linear  regression  and  neural 
network  modeling  to  compare  the  predictive  accuracy  of  the  models  (99;  114).  The  MAPE 
provides  an  indication  of  the  model’s  mean  deviation  from  the  actual  (target)  value  of  the 
dependent  variable  and  is  typically  expressed  as  a  percentage  (99;  1 14).  MAPE  values  of  10% 
or  less  indicate  excellent  predictive  accuracy  (99;  1 14).  A  MAPE  of  10  -  20%  indicates  high 
predictive  accuracy,  20  -  50%  represents  average  accuracy,  and  higher  than  50%  indicates  low 
predictive  accuracy  (99;  114).  The  formula  for  MAPE  is  as  follows  (99): 


MAPE 


Yi 


x  100% 
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where  n  represents  the  number  of  cases,  F,  indicates  the  actual  (or  target)  value  of  the  i 
observation,  and  Yi  represents  the  predicted  value  of  the  i,h  observation  provided  by  the  model. 

Specific  Aim  3:  To  determine  which  statistical  model  accounts  for  the  greatest 
independent  variable  sensitivity  (as  determined  by  global  sensitivity  analysis)  in  statistically 
predicting  mental  and  physical  functioning  in  breast  cancer  survivors. 

In  general,  sensitivity  analysis  demonstrates  the  change  in  performance  of  a  statistical 
model  when  a  specific  independent  variable  is  omitted  from  the  model  (107).  As  such, 
sensitivity  analysis  highlights  the  relative  importance  of  each  independent  variable  to  the 
performance  of  the  overall  model  (14).  Global  sensitivity  analysis  was  proposed  for  each  model 
(linear  regression  and  neural  network)  in  the  present  study  to  determine  the  contribution  of  each 
independent  variable  to  the  accuracy  of  the  model  in  statistically  predicting  each  outcome 
variable  (mental  functioning  as  measured  by  the  CES-D  and  physical  activity  as  measured  by  the 
BRFSS).  However,  limitations  of  the  neural  network  software  program  did  not  allow  the 
contributions  of  the  independent  variables  to  be  held  constant  with  the  values  of  these  variables 
in  the  full  neural  network  model.  As  a  result,  an  accurate  comparison  could  not  be  made  among 
the  MSE  of  the  full  neural  network  model  and  the  MSE  of  the  neural  network  model  with  a 
variable  omitted  and  global  sensitivity  analysis  could  not  be  conducted  as  planned  on  the  neural 
network  model.  Therefore,  an  alternative  sensitivity  analysis  was  conducted  to  compare  the 
models  and  determine  which  independent  variables  were  most  important  in  model  prediction  (see 
Results  section).  Despite  this  alternative  sensitivity  analysis,  the  global  sensitivity  analysis  was 
conducted  on  the  linear  regression  model  to  identify  any  independent  variables  that  should  be 
removed  from  the  final  model  (i.e.,  variables  that  degraded  overall  model  performance); 
therefore,  an  overview  of  global  sensitivity  analysis  follows. 
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The  global  sensitivity  of  an  independent  variable  may  be  expressed  as  a  ratio  of  the  full 

model’s  error  when  a  given  independent  variable  is  omitted  to  the  model  to  the  full  model’s  error 

with  all  independent  variables  included  (99;  1 14).  For  example,  consider  a  full  statistical  model 

(with  all  independent  variables  included)  that  has  a  sum-of-squares  error  (SSEfun  modei)  of  .90.  If 

one  of  the  independent  variables  (Xi)  is  omitted  from  the  model  and  the  new  SSE  (i.e.  SSExi 

omitted)  is  .50,  then  the  accuracy  of  the  full  model  is  degraded  by  -.40  when  the  Xi  variable  is 

omitted  (suggesting  that  Xi  degrade  the  performance  of  the  full  model).  Furthermore,  the 

contribution  of  Xi  to  the  full  model  may  be  expressed  as  the  ratio  of  the  error  in  the  model 

without  Xi  (i.e.,  SSEXi  omitted,  .50)  to  the  error  in  the  full  model  (i.e.,  SSEfun  modei,  -90).  A  ratio  of 

<  1  indicates  that  the  independent  variable  significantly  degrades  the  performance  of  the  model 

and  should  be  removed  from  the  model  (99;  1 14).  This  ratio  can  be  expressed  as  follows: 

SSEy,  omitted 
SSEfun  model 

Although  there  is  no  statistical  procedure  to  evaluate  the  impact  of  independent  variable 
sensitivity  in  two  different  statistical  models,  the  sensitivity  ratios  of  each  independent  variable 
in  the  separate  models  can  be  compared  with  appropriate  software.  This  comparison  can  be 
accomplished  by  the  following  analysis.  First,  the  SSE  of  each  full  model  (SSEfUu  modei)  is 
calculated.  Next,  each  independent  variable  in  the  models  are  omitted  one  at  a  time  (with 
replacement)  and  an  SSE  is  calculated  for  each  of  the  models  without  that  specific  independent 
variable  (e.g.,  S S I i y /  omitted,  S S I omitted,  S S 1  v  omitted, ... ,  S S 1  vv/,  omitted).  A  sensitivity  ratio  can  then 
be  calculated  for  SSEy,,  omitted  and  SSEfun modei-  For  example,  an  SSE  for  each  statistical  model 
can  be  calculated  with  the  first  independent  variable  (e.g.,  age)  omitted  (referred  to  as  SSEage 
omitted).  The  sensitivity  ratio  of  age  can  then  be  calculated  for  both  the  linear  regression  model 
and  the  neural  network  model  in  statistically  predicting  the  outcome  variables.  The  sensitivity 
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ratio  of  age  for  the  linear  regression  model  can  be  compared  with  the  sensitivity  ratio  of  age  for 
the  neural  network  model  to  determine  whether  the  sensitivity  ratios  for  age  differ  between  the 
two  statistical  models  in  predicting  mental  and  physical  function.  The  age  independent  variable 
can  then  be  replaced  and  the  next  independent  variable  (e.g.,  gender)  can  be  omitted  from  both 
statistical  models  and  evaluated  for  its  effects  on  model  sensitivity.  Again,  the  sensitivity  ratios 
for  gender  can  be  compared  between  the  linear  regression  model  and  the  neural  network  model. 
This  process  can  be  conducted  for  all  independent  variables  in  the  models.  As  a  follow-up 
analysis,  independent  variables  with  a  global  sensitivity  ratio  of  <  1  represent  a  variable  that 
significantly  degrades  model  performance  and  should  be  removed  from  the  model  from  which 
this  ratio  was  calculated  (i.e.,  linear  regression  or  neural  network).  Each  of  the  models  (i.e., 
linear  regression  and  neural  network)  should  then  be  conducted  and  compared  again,  excluding 
those  independent  variables  identified  by  a  low  sensitivity  ratio.  Again,  in  the  present  study, 
global  sensitivity  analysis  was  only  possible  for  the  linear  regression  model  because  the  neural 
network  software  used  did  not  allow  for  the  original  weights  of  the  independent  variables  to  be 
held  constant  for  comparison.  An  alternative  sensitivity  analysis  was  employed  to  allow  for  the 
comparison  of  the  two  models  (see  Results  section). 

Although  linear  regression  approaches  are  commonly  used  predictive  statistical  models  in 
cancer  survivor  research,  comparing  a  linear  regression  model  to  a  neural  network  model  may 
reveal  a  more  sensitive  approach  or  different  pattern  of  relationships  for  understanding  which 
factors  are  related  to  mental  and  physical  functioning  in  cancer  survivors.  The  non-linear, 
adaptive  nature  of  neural  networks  allow  these  models  to  learn  the  characteristics  of  the  data  and 
use  an  iterative  approach  to  reduce  the  overall  error  in  the  model’s  predictions  suggesting  that 
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neural  network  analysis  may  be  a  more  sensitive,  accurate  predictive  model  than  linear 
regression. 
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CHAPTER  4:  Results 


Descriptive  Analyses 

The  original  dataset  consisted  of  400  breast  cancer  survivors.  Only  those  participants 
who  had  complete  data  on  all  variables  evaluated  in  the  current  study  were  retained  in  this 
analysis.  This  approach  resulted  in  194  breast  cancer  survivor  participants  in  the  models 
predicting  depression  scores  on  the  Center  for  Epidemiological  Studies  -  Depression  Scale 
(CES-D),  and  192  breast  cancer  survivors  in  the  models  predicting  physical  activity  scores  on  the 
Behavioral  Risk  Factor  Surveillance  System  (BRFSS).  Chi-square  and  independent  sample  t-test 
analyses  demonstrated  no  significant  differences  for  those  participants  retained  in  the  present 
study  as  compared  to  those  who  were  removed  on  any  of  the  independent  variables  in  the  models 
(see  Table  3).  With  regard  to  the  dependent  variables  in  the  models,  no  significant  differences 
were  observed  for  retained  versus  removed  participants  on  CES-D  scores  (t  (362)  =  -.54,  p  = 

.59);  however,  the  BRFSS  scores  were  significantly  different  Of 2  =15.80,  df  =  3,  p  =  .00)  for 
those  participants  who  were  retained  as  compared  to  those  removed.  Participant  demographic, 
medical,  and  clinical  characteristics  for  each  dependent  variable  are  presented  in  Table  4. 

Bivariate  correlations  assessing  direction  and  strength  of  relationships  for  each  of  the 
independent  variables  and  dependent  variables  are  presented  in  Table  5  for  CES-D  and  Table  6 
for  BRFSS.  Collinearity  diagnostics  were  also  conducted  to  evaluate  any  multicollinearity 
among  the  independent  variables.  Although  some  level  of  multicollinearity  is  to  be  expected 
between  certain  predictors,  high  levels  of  multicollinearity  can  artificially  reduce  the  statistical 
significance  of  the  affected  predictor  variables  and  increase  the  likelihood  of  a  Type  II  error  (89). 
In  this  preliminary  analysis,  variables  were  considered  collinear  if  they  demonstrated  a  variable 
inflation  factor  (VIF)  greater  than  or  equal  to  10,  or  a  tolerance  value  less  than  .10  (90).  No 
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multicollinearity  was  identified  for  the  independent  variable  relationships;  therefore,  all  variables 
were  retained  in  the  initial  analyses. 

Following  this  preliminary  analysis,  the  two  predictive  models  (linear  regression  and 
neural  network)  were  constructed  for  each  dependent  variable:  CES-D  and  BRFSS.  Models 
predicting  the  same  dependent  variable  were  compared  on  model  fit,  predictive  accuracy,  and 
independent  variable  sensitivity  analysis. 
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Table  3.  Significant  Differences  for  Retained  versus  Removed  Participants  for  each  Dependent 


Variable 


Characteristic 

CES-D 
{N=  194) 

BRFSS 
(N=  192) 

Demographic 

Characteristic 

Age 

t=  1.84 

df  =  277 

p  =  .07 

t=  1.74 

df  =  277 

p  =  .08 

Race 

7  =  -87 

df  =  4 

p  =  .93 

7  -  -41 

df  =  4 

p  =  .98 

Education 

7=10.16 

df  =  6 

p  =  A2 

7  =  8.06 

df  =  6 

p  =  .23 

Partnered 

7  =  -oo 

df  =  1 

p=  1.00 

7  =  -30 

df  =  1 

p  =  .59 

Employment 

/=  1-31 

df  =  3 

p  =  .73 

/=  1-14 

df  =  3 

p  =  .77 

Income 

/=  2.73 

df  =  6 

p  =  .84 

7  =  5.75 

df  =  6 

p  =  .45 

Medical  Characteristic 

Tumor  stage  at  diagnosis 

^  =  1-83 

df  =  2 

p  =  .40 

7  =  1-54 

df  =  2 

p  =  .46 

Years  since  diagnosis 

t  -  .08 

df  =  393 

p  =  .93 

t=  .05 

df  =  393 

p  =  .96 

Primary  anticancer 
treatment  type 

^  =  1-95 

df  =  4 

p  =  .74 

7  =3.10 

df  =  4 

p  =  .54 

Adjuvant  treatment 
received 

7  =-05 

df  =  1 

p  =  .83 

77 oo 

df  =  1 

p  =  .96 

Years  since  treatment 

t  -  .97 

df  =389 

p  =  .34 

t=  .94 

df  =  389 

p  -  .35 

Menopausal  status 

7  =  1-71 

df  =  2 

p  =  .43 

7  =  2.69 

df  =  2 

p  =  .26 

Clinical  Characteristic 
/CSPro  Domain 

Symptom  Burden 

t  -  -1.08 

df  =  359 

p  =  .28 

t  =  -.89 

df  =  359 

p  =  .38 

Function 

t  -  -.35 

df  =  358 

p  =  .73 

t=  .13 

df  =  358 

p  =  .90 

Health  Behavior 

f  =  -1.0 

df  =  383 

p  -  .31 

t  =  -.36 

df  =  394 

p  =  .72 

Health  Service 

t  -  -.72 

df  =365 

p  =  .47 

t  =  -.45 

df  =  365 

p  =  .65 

CES-D  =  Center  for  Epidemiological  Studies  -  Depression  Scale,  BRFSS  =  Behavioral  Risk 
Factor  Surveillance  System  -  Physical  Activity,  CSPro  =  Cancer  Survivor  Profile 


47 


Table  4.  Participant  Demographic,  Medical,  and  CSPro  Characteristics  for  each  Dependent 


Variable 


CES-D 

BRFSS 

Characteristic 

N 

% 

N 

% 

Cancer  History 

194 

100% 

192 

100% 

Breast  cancer  survivor 

Demographic  Characteristic 

Age 

M  =  50.62 

M  =  50.65 

(SD  =  10.81) 

(SD  =  10.69) 

Race 

Asian 

4 

2.1% 

4 

2.1% 

Black  or  African  American 

12 

6.2% 

11 

5.7% 

Caucasian 

172 

88.7% 

171 

89.1% 

Native  American/ Alaska  Native 

2 

1.0% 

2 

1.0% 

Other 

4 

2.1% 

4 

2.1% 

Education 

High  school 

14 

7.2% 

14 

7.3% 

Some  college 

41 

21.1% 

40 

20.8% 

Associates  degree 

26 

13.4% 

25 

13.0% 

Bachelors  degree 

44 

22.7% 

42 

21.9% 

Some  graduate  school 

16 

8.2% 

17 

8.9% 

Graduate  degree 

53 

27.3% 

54 

28.1% 

Partnered 

No 

62 

32.0% 

64 

33.3% 

Yes 

132 

68.0% 

128 

66.7% 

Employment 

Unemployed  by  choice 

40 

20.6% 

38 

19.8% 

Unemployed  not  by  choice 

19 

9.8% 

21 

10.9% 

Working  full-time 

104 

53.6% 

105 

54.7% 

Working  part-time 

31 

16.0% 

28 

14.6% 

Income 

Less  than  $10,000 

7 

3.6% 

8 

4.2% 

$10,000 -$19,000 

6 

3.1% 

7 

3.6% 

$20,000  -  $39,000 

27 

13.9% 

24 

12.5% 

$40,000  -  $59,000 

37 

19.1% 

38 

19.8% 

$60,000  -  $79,000 

38 

19.6% 

36 

18.8% 

$80,000  -  $99,000 

26 

13.4% 

27 

14.1% 

$100,000  or  more 

53 

27.3% 

52 

27.1% 

Medical  Characteristic 

Tumor  stage  at  diagnosis 

Stage  I 

74 

38.1% 

73 

38.0% 

Stage  II 

73 

37.6% 

73 

38.0% 

Stage  III 

47 

24.2% 

46 

24.0% 

Years  since  diagnosis 

M  =  2.86 

M  =  2.86 

(SD  =  1.96) 

(SD=  1.95) 
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Primary  anticancer  treatment  type 
Surgery  only 

Surgery  +  Chemotherapy 

Surgery  +  Radiation 
Chemotherapy  +  Radiation 
Surgery,  Chemotherapy,  and 
Radiation 

19 

35 

30 

1 

109 

9.8% 

18.0% 

15.5% 

0.5% 

56.2% 

20 

36 

27 

1 

108 

10.4% 

18.8% 

14.1% 

0.5% 

56.3% 

Adjuvant  treatment  received 

Yes 

116 

59.8% 

114 

59.4% 

No 

78 

40.2% 

78 

40.6% 

Years  since  treatment 

M  =  2.09 

M  =  2.09 

(SD  =  1.49) 

(SD=  1.49) 

Menopausal  status 

Premenopausal 

46 

23.7% 

44 

22.9% 

Premenopausal  before  cancer 

78 

40.2% 

79 

41.1% 

/postmenopausal  after 

treatment 

Postmenopausal 

70 

36.1% 

69 

35.9% 

Clinical  Characteristic 

/CSPro  Domain 

Symptom  Burden 

M  =  79.96 

M  =  79.78 

(SD  =  20.63) 

(SD  =  20.71) 

Function 

M  =  49.84 

M  =  49.53 

(SD  =  13.63) 

(SD=  13.60) 

Health  Behavior 

M  =  9.11 

M  =  9.04 

(SD  =  2.12) 

(SD  =  2.10) 

Health  Service 

M  =  48.53 

M  =  48.35 

(SD  =  14.33) 

(SD  =  14.34) 

CES-D  =  Center  for  Epidemiological 

l  Studies  -  Depression  Scale,  BRFSS  =  Behavioral  Risk 

Factor  Surveillance  System  -  Physical  Activity,  M  =  mean,  SD  =  standard  deviation,  CSPro  = 
Cancer  Survivor  Profile 
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CSPro-HS  =  Health  Service  Domain 
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Table  6.  Pearson  correlations  for  variables  in  BRFSS  analysis 


Preliminary  Model  Performance 


The  overall  performance  of  the  standard  linear  regression  in  predicting  scores  of 
depression  (CES-D)  from  16  independent  variables  (age,  race,  education,  partner  status, 
employment,  income,  stage  at  diagnosis,  time  since  diagnosis,  treatment  received,  adjuvant 
treatment,  time  since  treatment,  menopausal  status,  symptom  burden,  function,  health  behavior, 
and  health  service  needs)  yielded  a  statistically  significant  model  which  explained  65.4%  of  the 
variance  in  depression  scores,  F  (16,  177)  =  20.951,  p  <  .001.  In  predicting  physical  activity 
scores  (BRFSS),  the  standard  linear  regression  used  the  same  16  independent  variables  which 
also  resulted  in  a  statistically  significant  model  that  accounted  for  21.3%  of  the  overall  variance, 
F  (16,  175)  =  2.952,  p  <  .001.  With  regard  to  neural  network  analysis,  there  is  no  absolute 
criterion  for  determining  statistical  significance  of  a  model;  however,  as  described  above,  the 
neural  network  models  developed  in  the  present  study  used  the  same  16  independent  (predictor) 
variables  to  predict  the  same  dependent  (output)  variables  in  order  to  compare  the  different 
statistical  models.  The  original  neural  network  model  predicting  depression  scores  (CES-D) 
resulted  in  4  nodes  (plus  the  bias  node)  in  the  hidden  layer  (see  Figure  4).  In  predicting  physical 
activity  (BRFSS),  the  constructed  neural  network  model  also  resulted  in  4  nodes  (plus  the  bias 
node)  in  the  hidden  layer  (see  Figure  5).  As  a  reminder,  nodes  in  the  hidden  layer  serve  the  dual 
purpose  of  summing  the  error  weights  from  the  inputs  and  applying  an  activation  function  to 
these  summed  weights,  which  allows  non-linearity  to  be  introduced  into  the  predictive  model. 
Findings  from  these  analyses  are  explored  in  further  detail  below. 
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Figure  4.  Full  Neural  Network  Model  of  Depression  Scores 

Note:  The  size  of  each  rectangle  provides  a  pictorial  representation  of  the  contribution  of 
that  independent  variable  to  the  prediction  of  depression  scores  (Table  8  illustrates  the 
numerical  importance  of  each  independent  variable  in  the  predictive  model) 
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Figure  5.  Full  Neural  Network  Model  of  Physical  Activity  Scores 

Note:  The  size  of  each  rectangle  provides  a  pictorial  representation  of  the  contribution  of 
that  independent  variable  to  the  prediction  of  physical  activity  scores  (Table  8  illustrates 
the  numerical  importance  of  each  independent  variable  in  the  predictive  model) 
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To  provide  a  thorough  basis  from  which  to  compare  the  two  models,  a  variety  of 
recommended  statistical  indices  (120)  are  provided  in  Table  7.  The  observed  mean  (O) 
and  standard  deviation  (s0)  of  the  actual  (or  observed)  scores  on  the  dependent  variables 
are  provided  in  the  top  of  the  table.  The  predicted  mean  (P)  and  standard  deviation  (sp) 
scores  are  also  provided.  In  this  analysis,  the  standard  deviation  of  all  predicted  values  is 
lower  than  the  observed  standard  deviation  of  the  dependent  variable,  indicating  that  both 
models  failed  to  fully  account  for  the  variability  in  the  dataset.  Perhaps  most  notable  is 
the  marked  disparity  between  the  standard  deviation  of  the  neural  network  model 
predicting  CES-D  scores  and  the  observed  standard  deviation.  This  disparity  is  not  as 
pronounced  in  the  linear  regression  model,  and  suggests  that  the  neural  network  model 
was  less  able  to  capture  the  true  variance  represented  in  the  original  data.  Model 
performance  is  further  highlighted  when  comparing  the  observed  ranges  (Range0)  to  the 
model  predicted  ranges  (Rangep).  In  both  cases,  the  neural  network  model  demonstrated 
a  markedly  restrictive  predictive  range  as  compared  to  the  observed  values  of  the 
dependent  variable,  suggesting  the  neural  network  model  may  have  been  less  sensitive  to 
more  extreme  scores. 

Mean  bias  error  (MBE)  is  a  general  indicator  of  whether  a  model  over-  or  under¬ 
predicts  scores  on  the  dependent  variable,  with  under-prediction  indicated  by  a  negative 
MBE  value  (28).  MBE  is  calculated  by  subtracting  observed  values  of  the  dependent 
variable  from  predicted  values  from  the  model  (P  -  O).  As  shown  in  the  table,  the  neural 
network  model  tended  toward  slight  under-prediction  for  both  dependent  variables. 
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Table  7.  Full  Predictive  Models  Performance  Statistics 


One  measure  of  error  used  in  both  the  linear  regression  and  the  neural  network 
models  is  the  sum-of-squares  error  (SSE;  also  commonly  referred  to  as  the  sum-of- 
squares  residual)  (58;  60).  The  SSE  is  calculated  by  squaring  and  summing  the 
differences  between  the  actual  and  predicted  values  of  the  dependent  variable;  yielding  an 
overall  sum  of  the  squared  errors  in  model  prediction.  The  SSE  alone  may  not  provide 
sufficient  information  to  compare  predictive  models;  however,  the  SSE  can  be  divided  by 
the  residual  degrees  of  freedom  to  produce  the  Mean  Square  Error  (MSE).  MSE  provides 
a  general  indicator  of  overall  model  fit,  with  a  lower  MSE  suggesting  a  better  fitting 
model  (99;  114). 

Comparison  of  Model  Fit 

In  this  study,  best  model  fit  was  defined  as  the  model  yielding  the  smallest  MSE. 
The  first  specific  aim  of  the  present  study  was  to  determine  which  statistical  model,  linear 
regression  or  neural  network,  produced  the  best  model  fit  in  statistically  predicting 
depressive  symptoms  and  physical  functioning  in  breast  cancer  survivors.  The  a  priori 
hypotheses  for  this  specific  aim  were  that  the  neural  network  models  would  yield  a  lower 
MSE,  indicating  better  model  fit,  than  the  linear  regression  models  in  predicting  both 
depressive  symptoms  and  physical  activity.  For  the  models  predicting  depression  scores 
(CES-D),  this  hypothesis  was  not  confirmed  (see  Table  7).  In  this  case,  the  linear 
regression  demonstrated  a  lower  MSE  (59.70)  as  compared  to  the  neural  network  MSE 
(60.37),  suggesting  the  regression  analysis  provided  a  better  overall  model  fit  of 
depression  scores  than  the  neural  network  model;  however,  the  differences  in  MSE  scores 
between  the  two  models  is  less  than  one,  indicating  no  difference  between  the  two 
models.  For  the  models  predicting  physical  activity  scores  (BRFSS),  this  hypothesis  was 
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confirmed  with  a  lower  neural  network  MSE  (0.62)  than  the  regression  model  MSE 
(1.09)  indicating  a  better  model  fit  for  the  neural  network  analysis.  Here  again,  though, 
there  was  no  appreciable  difference  between  the  models. 

The  Root  Mean  Square  Error  (RMSE)  is  also  provided  in  Table  7.  RMSE  is 
referred  to  in  regression  analysis  as  the  standard  error  of  the  estimate,  indicating  the 
standard  deviation  of  the  error  (or  residual)  term.  RMSE  is  calculated  by  taking  the  root 
of  the  MSE  term.  RMSE  is  a  useful  statistic  in  that  it  provides  a  more  intuitive 
understanding  of  the  size  of  the  model’s  typical  prediction  error  because  the  squared  term 
has  been  removed  yielding  a  measurement  that  is  in  the  same  units  as  the  original  data. 
Here,  the  linear  regression  demonstrated  an  RMSE  of  7.73  compared  to  the  neural 
network  RMSE  of  7.77  in  predicting  depression  (CES-D).  Again  the  difference  between 
the  RMSE  of  each  model  was  negligible.  Similarly,  the  neural  network  RMSE  (0.79) 
was  not  markedly  different  than  the  linear  regression  RMSE  (1.04)  in  predicting  physical 
activity  (BRFSS). 

Comparison  of  Model  Predictive  Accuracy 

High  predictive  accuracy  was  defined  in  the  present  study  by  the  lowest  mean 
absolute  percentage  error  (MAPE).  The  MAPE  indicates  the  model’s  mean  deviation 
from  the  observed  value  of  the  dependent  variable  and  is  typically  represented  as  a 
percentage  (99;  1 14).  To  calculate  the  MAPE,  mean  absolute  error  (MAE)  is  first 
calculated  by  taking  the  average  of  the  absolute  values  of  the  model’s  prediction  errors. 
Like  the  RMSE,  the  MAE  statistic  uses  the  same  units  as  the  original  data  and,  as  a  result, 
may  be  a  more  intuitive  indicator  of  the  overall  accuracy  of  the  model.  However, 
multiplying  the  MAE  by  100%  converts  this  statistic  into  the  MAPE,  which  is  a 
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percentage  measurement  that  can  be  compared  to  a  criterion  of  performance  to  assess 
predictive  accuracy.  Specifically,  MAPE  values  of  <  10%  suggest  excellent  predictive 
accuracy;  10  -  20%  suggests  high  predictive  accuracy;  20  -  50%  indicates  average 
accuracy;  and,  >  50%  suggests  low  predictive  accuracy  (99;  1 14). 

The  second  specific  aim  of  this  study  was  to  determine  which  predictive  model, 
linear  regression  or  neural  network,  produced  the  highest  predictive  accuracy  in 
statistically  predicting  depressive  symptoms  and  physical  functioning  in  breast  cancer 
survivors.  The  a  priori  hypotheses  for  this  specific  aim  were  that  the  neural  network 
models  would  have  a  lower  MAPE,  indicating  better  accuracy,  than  the  linear  regression 
models  in  predicting  both  depressive  symptoms  and  physical  activity.  These  hypotheses 
were  not  confirmed  for  both  dependent  variables  (see  Table  7).  In  both  cases,  the  linear 
regression  model  produced  a  lower  MAPE  (CES-D  MAPE  =  583.52%;  BRFSS  MAPE  = 
83.12%)  than  the  neural  network  model  (CES-D  MAPE  =  844.52%;  BRFSS  MAPE  = 
92.31%),  suggesting  higher  predictive  accuracy  for  the  regression  analyses.  However, 
neither  of  the  models  performed  particularly  well  in  accurately  predicting  the  dependent 
variables.  Both  models  produced  an  extremely  high  MAPE  when  predicting  depression 
scores  (CES-D);  and,  although  the  models  fared  somewhat  better  in  predicting  physical 
activity  scores  (BRFSS),  the  MAPE  values  were  still  much  higher  than  the  threshold 
criterion  of  50%,  suggesting  markedly  low  predictive  accuracy  for  both  models.  This 
poor  performance  may  be  explained  in  part  by  the  characteristic  of  the  MAPE  statistic. 
Specifically,  the  MAPE  is  highly  sensitive  to  large  percentage  errors  in  small  predictive 
zones  (59).  For  example,  if  the  actual  data  yields  a  score  of  5  yet  the  model  predicts  a 
score  of  10,  this  equates  to  a  50%  error;  however,  if  the  actual  score  is  100  and  the  model 
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predicts  a  score  of  70,  this  is  only  a  30%  error  but  a  much  larger  difference  in  the  actual 
error  than  that  produced  by  the  smaller  zone.  Indeed,  this  bias  of  the  MAPE  may  be  most 
evident  in  the  models  predicting  depressive  scores  because  the  actual  values  of  the  CES- 
D  indicated  a  non-normal  distribution  (Kolmogorov-Smirnov  statistic  =  .14 ,p  <  .001) 
and  a  positive  skew  of  1.02,  suggesting  a  distribution  with  more  scores  in  the  lower 
range.  Similarly,  BRFSS  scores  also  showed  a  non-normal  distribution  (Kolmogorov- 
Smirnov  statistic  =  .22,  p  <  .001).  However,  actual  values  of  the  BRFSS  consisted  of  a 
much  smaller  range  of  scores  (i.e.,  1  to  4)  and  demonstrated  a  slightly  negative  skew  of  - 
.33.  This  negative  skew  indicates  more  scores  in  the  higher  range  which  would  suggest 
less  opportunity  for  the  MAPE  to  be  biased  by  predictive  errors.  However,  the  MAPE 
may  also  be  lower  in  the  models  predicting  BRFSS  simply  because  of  the  smaller  range 
of  scores. 

Comparison  of  Model  Sensitivity  Analysis 

Sensitivity  analysis  demonstrates  the  change  in  performance  of  a  statistical  model 
when  a  specific  independent  variable  is  omitted  from  the  model  (107).  As  such, 
sensitivity  analysis  highlights  the  relative  importance  of  each  independent  variable  to  the 
performance  of  the  overall  model  (14).  The  third  specific  aim  of  this  study  was  to 
determine  which  statistical  model  accounted  for  the  greatest  independent  variable 
sensitivity  in  predicting  depressive  scores  (CES-D)  and  physical  activity  (BRFSS)  in 
post-treatment  breast  cancer  survivors.  The  comparison  measure  for  this  particular  aim 
was  a  global  sensitivity  analysis.  The  global  sensitivity  of  an  independent  variable  may 
be  expressed  as  a  ratio  of  the  full  model’s  error,  when  a  given  independent  variable  is 
omitted,  to  the  full  model’s  error  with  all  independent  variables  included  (99;  1 14).  In 
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this  case,  error  refers  to  the  sum-of-squares  error  (SSE).  Given  this  information,  an  error 
ratio  can  be  calculated  for  each  independent  variable.  Ratios  <  1  indicate  that  the 
independent  variable  significantly  degrades  the  performance  of  the  model  and  should  be 
removed  from  the  model  (99;  1 14).  The  a  priori  hypotheses  for  this  specific  aim  were 
that  the  neural  network  models  would  account  for  the  greater  independent  variable 
sensitivity  than  the  linear  regression  models  in  predicting  both  depressive  symptoms  and 
physical  activity.  Originally,  the  analytic  plan  called  for  conducting  global  sensitivity 
ratios  in  the  manner  outlined  above  and  comparing  the  ratios  of  the  two  predictive  models 
for  each  independent  variable  for  specific  aim  3;  however,  this  particular  analysis  could 
not  be  conducted  because  of  the  iterative  nature  of  the  neural  network  model  which 
caused  the  model  to  behave  erratically  when  variables  were  removed  and  did  not  provide 
sufficient  data  to  determine  accurate  global  sensitivity  ratios  as  described  here. 
Specifically,  the  model  did  not  allow  for  the  original  connection  weights  of  the  remaining 
variables  to  be  maintained  which  did  not  allow  for  an  accurate  comparison  to  determine 
the  neural  network’s  global  sensitivity  ratio.  This  unexpected  effect  may  be  the  result  of 
the  neural  network  software  used  in  the  present  study  (SPSS  Neural  Network  Add-on) 
which  did  not  allow  the  contributions  of  the  independent  variables  to  be  held  constant 
with  the  values  of  these  variables  in  the  full  neural  network  model.  As  a  result,  an 
accurate  comparison  could  not  be  made  among  the  MSE  of  the  full  neural  network  model 
and  the  MSE  of  the  neural  network  model  with  a  variable  omitted.  Therefore  global 
sensitivity  analysis  could  not  be  conducted  as  planned  on  the  neural  network  model.  This 
analysis  was  conducted  for  the  linear  regression  model,  and  revealed  that  no  variables 
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should  be  removed  from  the  model  (i.e.,  none  with  a  ratio  <  1  that  would  suggest 
significantly  degraded  model  performance). 

Although  the  original  specific  aim  could  not  be  evaluated  as  proposed,  an 
alternative  sensitivity  analysis  was  conducted  to  evaluate  which  independent  variables 
were  most  important  in  model  prediction.  Independent  variable  importance  is  a 
sensitivity  analysis  that  demonstrates  how  much  the  model’s  overall  variance  would 
decrease  if  the  specific  variable  was  removed  from  the  model  (60;  78;  90).  Using 
independent  variable  importance  sensitivity  analysis,  independent  variables  can  be  ranked 
and  ordered  by  level  of  importance  in  model  prediction  (60;  78;  90).  In  linear  regression, 
this  information  is  provided  by  the  semipartial  correlation  coefficients  (or  part 
correlations)  represented  by  the  statistic  sr.  The  semipartial  correlation  coefficient  can  be 
squared  (sr2)  and  then  multiplied  by  100%  to  provide  the  percentage  of  variance  in  the 
dependent  variable  (i.e.,  CES-D  or  BRFSS)  uniquely  explained  by  the  particular 
independent  variable  (90).  For  example,  in  the  linear  regression  model  of  CES-D,  the 
CSPro-Symptom  Burden  variable  had  an  sr2  of  .0864,  which  suggests  that  the  CSPro- 
Symptom  Burden  uniquely  explains  8.64%  of  the  variance  in  the  CES-D.  Additionally,  if 
the  CSPro-Symptom  Burden  variable  were  removed  from  the  model,  the  variance  in  the 
overall  model  of  CES-D  would  decrease  by  8.64%.  In  the  neural  network  analysis,  these 
coefficients  are  provided  in  an  independent  variable  importance  output  (60),  and  can  be 
converted  in  the  same  manner  (squared  and  multiplied  by  100%)  to  yield  the  same 
information. 

Table  8  provides  the  comparisons  and  totals  of  the  independent  variable 
importance  analysis,  with  each  model’s  predictor  variables  presented  in  rank  order  of 
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importance.  The  a  priori  hypotheses  were  that  the  neural  network  models  would 
demonstrate  a  higher  aggregate  sr2  than  the  linear  regression  models  in  predicting  both 
depressive  symptoms  and  physical  activity.  In  the  models  predicting  depressive  scores 
(CES-D),  the  hypothesis  was  confirmed  in  that  the  neural  network  model  produced  a  total 
sr 2  of  .1993  as  compared  to  the  linear  regression’s  aggregate  sr2  of  .1321.  Although  these 
findings  suggest  that  the  neural  network  model  uniquely  accounted  for  more  of  the 
variance  in  the  CES-D,  these  findings  must  be  examined  in  context  and  include  the 
results  from  the  first  two  specific  aims  which  suggested  that  the  neural  network  model 
did  not  outperform  the  linear  regression  model  with  regard  to  goodness  of  fit  or 
predictive  accuracy  in  predicting  CES-D.  With  regard  to  the  models  predicting  BRFSS, 
the  hypothesis  was  not  confirmed.  In  this  case,  the  linear  regression  model  produced  a 
higher  aggregate  sr2  of  .1474  as  compared  to  the  neural  network’s  total  sr 2  of  .1322. 
Despite  these  findings,  the  difference  between  the  total  sr2  of  the  models  appears  to  be 
non- significant. 

Perhaps  a  more  interesting  comparison  is  provided  when  examining  which 
variables  were  considered  most  important  in  the  overall  predictions  of  each  model. 
Traditional  linear  regression  analysis  provides  a  threshold  criterion  for  statistical 
significance  of  independent  variables  in  predicting  the  dependent  variable  of  interest. 

This  statistic  is  presented  as  the  p-value,  where  a  value  <  .05  indicates  that  the  specific 
independent  variable  made  a  unique,  statistically  significant  contribution  to  the  model’s 
overall  prediction  of  the  dependent  variable  (90).  Although  no  specific  statistical 
criterion  exists  for  retaining  variables  in  a  neural  network  model  (50;  73;  118),  variables 
that  uniquely  accounted  for  >  1  %  of  the  variance  in  the  dependent  variable  were 
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examined  for  comparison  to  those  independent  variables  identified  as  statistically 
significant  in  the  linear  regression  analysis.  In  the  linear  regression  model  predicting 
depressive  scores  (CES-D),  the  two  statistically  significant  variables  were  CSPro- 
Symptom  Burden  (beta  =  .54.  p  <  .001)  and  CSPro-Function  ( beta  =  .27,  p  =  .001).  There 
were  four  variables  in  the  neural  network  model  with  a  correlation  with  the  independent 
variable  of  importance  of  >  1%.  These  variables  were  CSPro-Function,  age,  CSPro- 
Symptom  Burden,  and  CSPro-Health  Service  Needs.  In  the  linear  regression  model 
predicting  physical  activity  (BRFSS),  the  only  statistically  significant  variable  was 
CSPro-Health  Behavior  (beta  =  -.37,  p  <  .001).  By  comparison,  the  neural  network 
model  identified  four  independent  variables  with  importance  values  >  1%.  These 
variables  were  CSPro-Health  Behavior,  years  since  diagnosis,  CSPro-Symptom  Burden, 
and  age.  Again,  the  results  of  this  sensitivity  analysis  should  be  examined  in  the  context 
of  all  the  statistical  analyses  which  suggested  no  appreciable  differences  in  the 
performance  of  the  linear  regression  and  the  neural  network  model  in  the  areas  of 
goodness  of  model  fit  and  predictive  accuracy  for  both  dependent  variables. 

However,  comparison  of  the  two  approaches  may  have  clinical  relevance. 
Specifically,  the  neural  network  model  showed  that  four  variables  (CSPro-Function,  age, 
CSPro-Symptom  Burden,  and  CSPro-Health  Service)  were  related  to  depressive  scores  in 
this  breast  cancer  survivor  sample  whereas  the  linear  regression  demonstrated  statistical 
significance  for  only  CSPro-Symptom  Burden  and  CSPro-Function.  Similarly,  the  neural 
network  model  predicting  physical  activity  scores  revealed  four  correlates  (CSPro-Health 
Behavior,  years  post-diagnosis,  CSPro-Symptom  Burden,  and  age)  compared  to  the  linear 
regression  which  only  identified  one  statistically  significant  variable  (CSPro-Health 
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Behavior).  These  findings  suggest  that  the  neural  network  models  concluded  that  more 
variables  are  related  to  outcomes  of  interest  than  the  linear  regression  models  in 
predicting  both  depressive  and  physical  activity  scores.  These  results  support  the  notion 
that  neural  networks  may  account  for  more  complexity  in  variable  relationships  and  that 
different  patterns  of  variables  may  be  important  clinically  with  regard  to  depression  and 
physical  activity. 
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Table  8.  Independent  Variable  Importance  Analysis _ 

Models  predicting  CES-D  (N  =  194)  |  Models  predicting  BRFSS  (N  =  192) 


Post-hoc  Analyses 


Several  post-hoc  analyses  were  conducted  to  determine  the  power  of  the  models 
to  detect  an  effect  size  given  this  particular  dataset  and  to  determine  whether  an 
empirically  pruned  predictive  model  would  yield  more  accuracy  in  predictions. 

Power  Analysis 

Post  hoc  power  analyses  were  conducted  using  G*Power  (43).  Effect  sizes  (f) 
were  determined  post  hoc  using  the  R-square  ( R 2)  of  the  full  linear  regression  models  for 
each  dependent  variable  to  determine  Cohen’s  effect  size  for  an  F-test  (29;  104).  When 
effect  sizes  are  measured  using  the  Cohen  f  statistic,  values  of  .02  are  considered  small, 
.15  are  medium,  and  .35  are  considered  large  (30).  The  effect  size  for  the  regression 
model  predicting  CES-D  was  large  at  1.89,  and  the  effect  size  for  the  regression  model 
predicting  BRFSS  was  medium  at  a  value  of  .27.  These  effect  sizes  were  then  used  to 
conduct  the  post  hoc  G*Power  analysis  of  the  two  regression  models.  Power  for  the 
model  predicting  CES-D  was  sufficient  at  1.00,  as  was  the  calculated  power  for  the 
model  predicting  BRFSS  at  .99.  Unfortunately,  there  are  no  statistical  procedures  to 
determine  an  a  priori  or  post  hoc  power  analysis  for  a  neural  network  model  (12); 
although  many  researchers  advocate  for  large  sample  sizes  in  neural  network  analysis 
(50;  73). 

Hierarchical  Regression  Model 

A  hierarchical  regression  analysis  was  conducted  to  evaluate  the  ability  of  the 
clinical  predictor  variables  (i.e.,  CSPro  domains)  to  predict  scores  of  depression  (CES-D) 
and  physical  activity  (BRFSS),  after  controlling  for  the  demographic  and  medical 
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predictor  variables.  Hierarchical  analysis  is  not  possible  with  the  SPSS  Neural  Network 
program;  therefore,  this  analysis  was  not  conducted  with  a  neural  network  model  in  the 
present  study. 

For  the  regression  predicting  depression  scores  (Table  9),  all  demographic 
variables  (age,  race,  education,  partner  status,  employment,  income)  were  entered  at  Step 
1 .  Demographic  variables  in  Step  1  explained  a  significant  amount  of  the  variance  ( R 2  = 
.16)  in  depression  scores.  The  addition  of  the  medical  variables  (stage  at  diagnosis,  time 
since  diagnosis,  treatment  received,  adjuvant  treatment,  time  since  treatment,  menopausal 
status)  in  Step  2  resulted  in  a  non-significant  increase  in  R2  of  .01 .  In  the  final  block, 

Step  3,  the  clinical  variables  were  added.  The  addition  of  clinical  variables  (symptom 
burden,  function,  health  behavior,  health  service  needs)  demonstrated  a  substantial  and 
statistically  significant  increase  in  R 2  of  .48  (p  <  .001)  and  brought  the  total  variance 
accounted  for  by  this  model  as  a  whole  to  65%,  F  (16,  177)  =  20.95,  p  <  .001.  The 
statistically  significant  findings  of  this  hierarchical  model  indicated  that  the  clinical 
variables  accounted  for  48%  of  the  variance  in  depression  scores,  over  and  above  the 
influence  of  demographic  and  medical  variables,  F  change  (4,  177)  =  61.76,  p  <  .001.  In 
the  final  model,  only  the  CSPro-Symptom  Burden  (beta  =  .54,  p  <  .001)  and  CSPro- 
Function  (beta  =  .27,  p  =  .001)  domains  were  statistically  significant. 

The  hierarchical  regression  predicting  physical  activity  scores  (Table  10) 
followed  the  same  procedure  outlined  above  with  demographic  variables  entered  at  Step 
1.  Demographic  variables  (age,  race,  education,  partner  status,  employment,  income) 
accounted  for  a  non-significant  amount  of  the  variance  (R2  =  .03)  in  physical  activity 
scores.  The  medical  variables  (stage  at  diagnosis,  time  since  diagnosis,  treatment 
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received,  adjuvant  treatment,  time  since  treatment,  menopausal  status)  entered  at  Step  2 
represented  a  non-significant  increase  in  R2  of  .01.  In  Step  3,  the  clinical  variables 
(symptom  burden,  function,  health  behavior,  health  service  needs)  were  added  and 
demonstrated  a  large  and  statistically  significant  increase  in  R2  of  .17  (p  <  .001).  This 
model  as  a  whole  accounted  for  21%  total  variance  in  physical  activity  scores,  F  (16, 
175)  =  2.95,  p  <  .001.  The  statistically  significant  findings  of  this  hierarchical  model 
demonstrated  that  the  clinical  variables  accounted  for  17%  of  the  variance  in  physical 
activity  scores,  over  and  above  the  influence  of  demographic  and  medical  variables,  F 
change  (4,  175)  =  9.59,  p  <  .001.  In  the  final  model,  only  the  CSPro-Health  Behavior 
domain  was  statistically  significant  ( beta  =  -.37,  p  <  .001). 
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.001,  **p  <  .001;  CES-D  =  Center  for  Epidemiological  Studies  -  Depression  Scale 


Table  9.  Hierarchical  Regression  Predicting  Depression  Scores  (CES-D) 


.001,  **p  <  .001;  BRFSS  =  Behavioral  Risk  Factor  Surveillance  System  -  Physical  Activity 


Table  10.  Hierarchical  Regression  Predicting  Physical  Activity  Scores  (BRFSS) 


Model  Comparisons  of  Significant  Independent  Variables 

Meyer  and  colleagues  (73)  suggest  that,  rather  than  compare  neural  network  and 
regression  models,  regression  models  should  provide  the  researcher  with  an  empirical 
approach  to  selecting  appropriate  predictor  variables  for  the  neural  network  model(50; 
73).  In  this  regard,  linear  regression  may  be  seen  as  a  necessary  first  step  in  determining 
the  subsequent  predictors  of  a  neural  network.  Using  this  approach,  only  the  statistically 
significant  variables  identified  in  the  hierarchical  regression  analysis  were  used  to 
evaluate  the  predictive  abilities  of  the  two  models,  linear  regression  and  neural  network; 
resulting  in  a  pruned  predictive  model.  Specifically,  only  CSPro-Symptom  Burden  and 
CSPro-Function  were  simultaneously  entered  as  predictors  in  both  the  pruned  linear 
regression  and  pruned  neural  network  models  predicting  depression  scores  (CES-D).  The 
pruned  neural  network  model  predicting  depression  scores  resulted  in  2  hidden  nodes 
(plus  the  bias  node).  Similarly,  only  CSPro-Health  Behavior  was  entered  as  a  predictor 
in  both  the  pruned  linear  regression  and  pruned  neural  network  models  predicting 
physical  activity  scores  (BRFSS).  This  pruned  neural  network  model  yielded  1  hidden 
node  (plus  the  bias).  Weights  for  the  neural  network  models  are  presented  in  Table  1 1 
and  Table  12.  Diagrams  for  these  models  are  provided  in  Figure  6  and  Figure  7. 
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Table  1 1 .  Weights  of  Pruned  Neural  Network  Model  Predicting  Depression  Scores 


Pruned  NN  Model  predicting  CES-D  (N  =  194) 

Input  Layer 

Hidden  Layers 

H(l:l) 

H(l:2) 

Bias 

-.417 

.261 

CSPro-S 

.065 

-.145 

CSPro-F 

-.046 

-.109 

CES-D  =  Center  for  Epidemiological  Studies  -  Depression  Scale,  CSPro-S  =  Symptom 
Burden  Domain,  CSPro-F  =  Function  Domain 


Synaptic  Weight  >  0 
- Synaptic  Weight  <  0 


Output  layer  activation  function:  Identity 

Figure  6.  Diagram  of  Pruned  Neural  Network  Model  Predicting  Depression  Scores 
SXBURDEN_CSPro  Domain  =  CSPro  Symptom  Burden  Domain;  FXN_CSPro  Domain 
=  CSPro  Function  Domain;  NoMD_CESD_TOTAF  =  CES-D  Depression  Scores 
Note:  The  size  of  the  rectangle  provides  a  pictorial  representation  of  the  contribution  of 
that  independent  variable  to  the  prediction  of  depression  scores  (Table  8  illustrates  the 
numerical  importance  of  each  independent  variable  in  the  predictive  model) 
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Table  12.  Weights  of  Pruned  Neural  Network  Model  Predicting  Physical  Activity 


Pruned  NN  Model  predicting  BRFSS  (N  =  192) 

Input  Layer 

Hidden  Layers 

H(  1:1) 

Bias 

2.051 

CSPro-HB 

-.015 

BRFSS  =  Behavioral  Risk  Factor  Surveillance  System  -  Physical  Activity,  CSPro-HB  = 
Health  Behavior  Domain 


Synaptic  Weight  >  0 
- Synaptic  Weight  <  0 


Bias 


- \ 

NoMO 
BRFSSl 
Activity!  evr 

I  I 

_ 

( - N 

HEALTHBX  CSPra 
Oomain 

V _ « 

Hidden  layer  activation  function:  Sigmoid 
Output  layer  activation  function:  Identity 

Figure  7.  Diagram  of  Pruned  Neural  Network  Model  Predicting  Physical  Activity  Scores 
HEALTHBX_CSPro  Domain  =  CSPro  Health  Behavior  Domain; 
NoMD_BRFSS_ActivityLevel  =  BRFSS  Physical  Activity  Scores 
Note:  The  size  of  the  rectangle  provides  a  pictorial  representation  of  the  contribution  of 
that  independent  variable  to  the  prediction  of  physical  activity  scores  (Table  8  illustrates 
the  numerical  importance  of  each  independent  variable  in  the  predictive  model) 
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The  comparative  results  of  these  analyses  are  presented  in  Table  13.  In  this  case, 
statistical  pruning  resulted  in  a  performance  degradation  for  the  models  predicting 
depression  scores  (CES-D).  This  decreased  performance  was  marginal  in  the  pruned 
linear  model  of  depression  scores,  yielding  a  slightly  higher  MSE  and  MAPE  as 
compared  to  the  same  values  in  the  original  model  (see  Table  7  for  original  model 
statistics).  However,  the  performance  degradation  was  marked  for  the  pruned  neural 
network  model  of  depression  scores,  demonstrating  notable  increases  in  the  MSE, 
MAPE,  and  MBE  as  compared  to  these  values  in  the  original  model  (see  Table  7  for 
original  model  statistics).  The  larger  MBE  in  this  case  suggested  that  the  pruned  neural 
network  model  tended  to  over  predict  the  scores  of  depression.  Statistical  pruning  also 
resulted  in  a  decreased  model  performance  for  the  pruned  linear  model  predicting 
physical  activity  scores  (BRFSS).  Again,  the  degradation  was  negligible  but  produced  a 
higher  MSE  and  MAPE  than  the  original  linear  model  (see  Table  7  for  original  model 
statistics).  However,  predictive  accuracy  was  slightly  improved  for  the  pruned  neural 
network  model  predicting  physical  activity  scores.  Specifically,  the  pruned  neural 
network  model  demonstrated  a  slightly  lower  MSE  and  MAPE  in  predicting  physical 
activity  as  compared  to  these  same  values  in  the  original  model  (see  Table  7  for  original 
model  statistics). 
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Table  13.  Pruned  Predictive  Models  Performance  Statistics 


Models  predicting  CES-D  (N  =  194) 

Models  predicting  BRFSS  (N  =  192) 

0  =  35.42 
s0  =  12.59 

Range0  =  20.00  to  74.00 

0  =  2.78 
s0  =  1.12 

Rangeu  =  1 .00  to  4.00 

Linear 

Regression 

Neural 

Network 

Linear 

Regression 

Neural 

Network 

P 

35.42 

33.35 

P 

2.78 

2.77 

sp 

9.96 

1.46 

sp 

0.45 

0.38 

Rangep 

14.94  to  61.50 

27.37  to  35.05 

Rangep 

1.07  to  3.86 

1.72  to  3.70 

MBE 

0.00 

2.07 

MBE 

0.00 

0.01 

SSE 

11430.95 

14020.49 

SSE 

202.45 

101.67 

MSE 

59.85 

73.41 

MSE 

1.07 

0.54 

RMSE 

7.74 

8.57 

RMSE 

1.03 

0.73 

MAE 

5.97 

9.27 

MAE 

0.87 

0.88 

MAPE 

596.92% 

927.46% 

MAPE 

87.35% 

88.37% 

R1 

.63** 

n/a 

Rl 

.16** 

n/a 

**p  <  .001;  CES-D  =  Center  for  Epidemiological  Studies  -  Depression  Scale;  BRFSS  = 
Behavioral  Risk  Factor  Surveillance  System  -  Physical  Activity;  O  =  average  of  the 
observed  values  of  the  dependent  variable;  s0  =  standard  deviation  of  the  observed  values 
of  the  dependent  variables;  Range0  =  range  of  the  observed  values  of  the  dependent 
variable;  P  =  average  of  the  predicted  values  of  the  dependent  variable;  sp  =  standard 
deviation  of  the  predicted  values  of  the  dependent  variable;  Rangep  =  range  of  the 
predicted  values  of  the  dependent  variable  MBE  =  mean  bias  error  (difference  between 
the  average  observed  and  average  predicted  values);  SSE  =  sum  of  squares  error;  MSE  = 
mean  square  error;  RMSE  =  root  of  mean  square  error;  MAE  =  mean  absolute  error; 
MAPE  =  mean  absolute  percent  error;  R 2  =  proportion  of  variance  explained  by  the 
model 
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CHAPTER  5:  Discussion 


This  study  investigated  whether  artificial  neural  network  modeling  demonstrated 
better  accuracy  (as  defined  by  less  predictive  error)  than  traditional  linear  regression  in 
predicting  depressive  symptoms  and  physical  activity  levels  in  a  sample  of  post-treatment 
breast  cancer  survivors.  The  two  predictive  models  were  compared  on  measures  of 
goodness  of  fit.  predictive  accuracy,  and  independent  variable  importance.  The  results  of 
the  present  study  generally  did  not  support  the  a  priori  hypotheses  that  neural  networks 
would  outperform  linear  regression  models  on  these  measures  and  there  are  a  number  of 
possible  reasons  for  these  results. 

This  study  does  indicate  several  important  clinical  findings.  The  results  front  the 
linear  regression  provide  new  evidence  of  the  importance  of  the  clinical  domains  of  the 
CSPro  as  correlates  of  mental  and  physical  functioning  in  this  sample  of  post-treatment 
breast  cancer  survivors.  Additionally,  both  the  linear  regression  and  neural  network 
analysis,  two  very  different  statistical  approaches,  with  different  underlying  assumptions 
of  the  relationships  among  independent  and  dependent  variables  produced  similar 
findings  with  regard  to  specific  CSPro  domains.  The  neural  network  approach  also 
identified  additional  variables  that  may  be  of  clinical  importance  and  may  have  a  non¬ 
linear  relationship  with  the  dependent  variables. 

Strengths  and  Clinical  Implications 

Relevant  clinical  implications  are  suggested  from  these  findings.  Of  the 
performance  metrics  used  in  the  present  study,  mean  square  error  (MSE)  may  be  the  most 
meaningful  measure  to  determine  whether  a  particular  statistical  model  is  performing 
well;  however,  the  clinical  value  of  these  statistical  models  is  best  determined  by  the 
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independent  variable  importance  analysis.  In  the  present  study,  the  independent  variable 
importance  analysis  identified  the  important  correlates  of  the  outcomes  of  interest  for 
both  statistical  models.  With  regard  to  these  findings,  the  neural  network  model  not  only 
confirmed  the  findings  of  the  linear  regression,  but  also  suggested  potentially  important 
non-linear  relationships  among  age  and  years’  post-diagnosis  with  the  outcome  of 
interest.  These  findings  suggest  that  the  neural  network  model  may  indeed  be  capturing 
more  complexity  in  the  relationships  among  these  variables  that  the  linear  regression  did 
not  detect,  and  suggest  that  additional  predictor  variables  (age  and  years’  post-diagnosis) 
may  have  clinical  relevance  with  regard  to  mental  and  physical  health  status  in  this  post¬ 
treatment  sample  of  breast  cancer  survivors. 

The  post-hoc  hierarchical  regression  analysis  revealed  a  statistically  significant 
relationship  for  both  the  CSPro  Symptom  Burden  domain  and  the  CSPro  Function 
Domain  related  to  depressive  symptoms  on  the  CES-D  measure  (Table  9).  Greater 
difficulties  with  symptoms  and  functioning  were  positively  correlated  with  higher  scores 
of  depression.  While  causality  cannot  be  determined,  these  associations  highlight  the 
importance  of  managing  symptom  burden,  function,  and  depressive  mood.  Additionally, 
the  significant  findings  of  the  hierarchical  regression  showed  that  the  statistical 
significance  of  the  clinical  variables  (which  are  modifiable)  was  over  and  above  the 
effects  of  the  demographic  and  medical  variables,  many  of  which  are  non-modifiable. 

Modifiable  variables  identified  as  important  by  both  the  neural  network  and  linear 
regression  models  in  predicting  depressive  scores  include  the  CSPro  Symptom  Burden 
and  CSPro  Function.  These  findings  suggest  that  interventions  designed  to  reduce 
symptoms  and  improve  function  could  have  an  appreciable,  positive  impact  on 
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decreasing  depressive  symptoms.  The  CSPro  Symptom  Burden  domain  encompasses 
areas  such  as  fatigue,  depressive  symptoms,  anxiety,  pain,  fear  of  recurrence,  body 
image,  and  fertility  distress.  The  Function  domain  of  the  CSPro  involves  social 
relationships,  work,  sexual  function,  cognitive  function,  and  sleep  disturbance.  Specific 
interventions  aimed  at  reducing  problems  in  both  symptom  burden  and  functional  areas 
may  include  psychoeducation,  various  forms  of  counseling  (individual,  couples,  group, 
web-  or  telephone-based),  and  behavioral  strategies  such  as  exercise  and  stress 
management  techniques  (23;  44;  70;  71;  92;  98;  105;  121).  While  symptom  burden  and 
function  were  also  identified  using  the  neural  network  approach,  this  technique  identified 
two  additional  survivor  variables  including  age  and  challenges  in  ability  to  obtain  health 
care  (Health  Service  Needs  Domain  of  the  CSPro)  (Table  8).  These  findings  suggest  that 
the  neural  network  model  may  be  taking  more  factors  into  account  and  possibly 
identifying  more  complexity  among  relationships.  Specifically,  age  and  health  service 
needs  may  have  an  important  non-linear  relationship  with  mental  functioning.  This 
represents  a  relevant  clinical  finding  suggesting  that  age  and  health  service  needs  may 
have  a  marked  impact  on  mental  functioning;  however,  the  actual  shape  of  these  non¬ 
linear  relationships  is  unclear  and  should  be  further  defined  to  better  aid  clinical  decision¬ 
making. 

Regarding  predictors  of  physical  activity  (BRFSS),  both  the  neural  network 
model  and  the  hierarchical  regression  model  identified  the  CSPro  Health  Behavior 
domain  as  an  important  predictor  (Tables  8  and  10).  The  significance  of  clinical 
predictors  in  the  hierarchical  regression  model  was  also  above  and  beyond  the  effects  of 
the  demographic  and  medical  variables.  Higher  scores  on  the  CSPro  Health  Behavior 
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domain  were  associated  with  lower  scores  on  the  BRFSS,  suggesting  that  poor  health 


behaviors  (poor  diet  and  exercise)  were  correlated  with  lower  levels  of  physical  activity. 
Here  again,  these  results  are  promising  because  health  behaviors  have  the  ability  to  be 
modified  through  various  forms  of  psychoeducation,  behavioral  strategies,  and  behavioral 
programs,  ultimately  improving  physical  activity  and  possibly  quality  of  life  (39;  61;  63; 
71;  88;  92;  1 16).  In  addition  to  health  behavior,  the  neural  network  model  also  identified 
time  since  diagnosis,  symptom  burden,  and  age  as  important  predictors  of  physical 
activity  on  the  BRFSS  (Table  8),  suggesting  again  that  perhaps  the  neural  network  model 
may  be  identifying  the  complexity  in  the  relationships  among  the  predictor  variables  and 
physical  activity  scores.  This  suggests  that  time  since  diagnosis,  symptom  burden,  and 
age  may  have  important  non-linear  relationships  with  physical  functioning.  These  factors 
are  important  clinically  with  regard  to  impacts  on  physical  functioning;  but,  the  actual 
shape  of  these  non-linear  relationships  is  not  known  and  should  also  be  further  defined  to 
assist  clinical  decision-making  regarding  physical  functioning. 

Model  Performance 

Although  the  linear  regression  model  produced  a  slightly  lower  overall  error  than 
the  neural  network  model,  the  differences  were  negligible  and  neither  model  performed 
particularly  well  on  our  selected  metrics  of  interest.  This  is  most  evident  regarding  the 
MAPE  findings,  none  of  which  was  below  the  threshold  criterion  of  50%  (above  which 
suggests  markedly  low  predictive  accuracy).  It  is  unknown  whether  the  sample  size  was 
adequate  for  neural  network  analysis,  but  this  explanation  does  not  account  for  the  poor 
performance  of  the  regression  model  since  the  post  hoc  analysis  suggested  adequate 
power  to  detect  an  effect  size.  Another  possibility,  as  stated  previously,  lies  in  the  non- 
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normal  distributions  of  the  dependent  variables.  However,  both  parametric  tests  and 
neural  network  models  are  quite  robust  with  regard  to  non-normal  data  (48;  72),  and  data 
resulting  from  psychological  research  is  quite  often  non-normally  distributed  (90).  An 
alternative  possibility  is  that,  although  the  selection  of  independent  variables  was  based 
on  theoretical  assumptions  and  experience,  there  may  be  different  predictor  variables  that 
could  be  included  to  increase  the  overall  accuracy  of  the  models.  Although  the  present 
study  did  not  support  the  original  hypotheses,  the  predictive  models  explored  should  not 
be  abandoned  on  the  basis  of  these  metrics  alone  as  the  results  suggest  important  clinical 
findings. 

Both  statistical  approaches  include  general  strengths  and  weaknesses  which  were 
observed  in  this  study.  Linear  regression  has  the  benefit  of  being  simple  to  use,  simple  to 
understand,  and  easy  to  interpret;  however,  linear  regressions  cannot  model  non-linear 
relationships  and  therefore  may  not  be  suitable  to  model  more  complex  relationships 
among  predictor  and  outcome  variables.  Neural  network  models  do  have  the  capability 
to  identify  non-linear  relationships  and  may  be  better  able  to  capture  complex 
relationships  among  variables;  but,  this  approach  is  much  more  difficult  to  interpret  and 
the  relationships  among  the  variables  are  not  easily  understood.  Because  of  these 
strengths  and  weaknesses,  researchers  studying  these  statistical  approaches  have 
suggested  that  a  complementary  use  of  the  two  models  (linear  regression  and  neural 
network)  may  be  the  most  optimal  approach  (73;  118).  Because  neural  networks  do  not 
allow  researchers  to  see  a  direct  relationship  between  predictors  and  the  outcome  of 
interest,  these  models  alone  may  not  be  sufficient  in  aiding  researchers  to  develop 
targeted  interventions  for  identified  outcomes,  such  as  mental  and  physical  functioning. 
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Conversely,  traditional  statistical  models  do  provide  information  regarding  relationships 
among  variables,  but  may  not  capture  more  complex,  nonlinear  relationships  inherent  in 
clinical  samples.  Despite  the  findings  in  the  present  study  using  fewer  predictor  variables 
in  the  pruned  models  (Tables  1 1-13),  linear  regression  may  be  a  useful  method  to  reduce 
the  number  of  variables  before  entering  them  in  a  neural  network  model  analysis.  In  this 
way,  linear  regressions  can  interpret  specific  relationships  that  lead  researchers  to 
targeted  interventions  while  neural  network  models  can  further  clarify  the  overall 
relationships  identified  by  the  regression  model. 

Limitations 

Although  a  post  hoc  power  analysis  demonstrated  moderate  to  large  effect  sizes  in 
the  dataset  and  substantial  power  to  detect  these  effects  using  a  linear  regression,  our 
sample  size  may  have  been  too  small  for  a  neural  network  model.  There  are  no  a  priori 
methods  to  determine  sufficient  sample  size  for  neural  network  models  (12).  Although 
neural  networks  can  be  used  with  small  datasets,  small  sample  sizes  can  decrease  the 
generalizability  of  the  results  and  may  make  the  analysis  more  susceptible  to 
multicollinearity  and  overfitting  the  data  (12;  73;  96).  Previous  neural  network  research 
with  similar  design  parameters  to  the  current  study  and  a  relatively  small  sample  size  has 
demonstrated  acceptable  generalizability  of  model  results  to  a  validation  sample. 
Specifically,  Baxt  (13)  conducted  a  neural  network  analysis  with  20  independent  (input) 
variables  to  determine  the  occurrence  of  myocardial  infarction  in  351  patients  who 
presented  with  chest  pain.  The  results  of  this  analysis  showed  good  generalization  to  a 
validation  sample  of  over  300  patients.  However,  data  in  the  present  study  were  not 
found  to  be  multicollinear,  and  the  results  suggested  that  underfitting  (rather  than 
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overfitting)  was  a  significant  problem.  Therefore,  neural  network  models  may  be  less 
stable  with  smaller  sample  sizes.  In  fact,  Warner  and  Misra  (118)  propose  that  traditional 
statistical  approaches  may  actually  be  preferred  over  neural  network  modeling  for  small 
sample  sizes,  citing  that  regression  models  perform  better  when  theory  or  experience 
suggests  the  underlying  relationship  between  factors  studied;  whereas,  neural  network 
models  are  more  useful  at  uncovering  a  previously  unknown  functional  relationship 
among  factors.  They  argue  that  this  feature  makes  neural  networks  data  dependent  and 
therefore  better  able  to  perform  as  the  sample  size  increases. 

Conversely,  overfitting  is  most  likely  to  occur  in  neural  networks  when  the 
sample  size  is  too  large  (95).  When  a  neural  network  overfits  the  data  on  which  it  is 
trained,  the  generalizability  of  the  model  can  be  significantly  degraded  (52).  There  are 
measures  in  place  to  reduce  the  likelihood  of  overfitting,  such  as  early  stopping  rules  (a 
measure  employed  in  this  study)  and  train-test-validation  sets;  however,  there  are  no  such 
measures  to  account  for  problems  associated  with  small  data  sets  in  neural  network 
modeling.  When  the  data  is  known  to  the  researcher,  such  problems  may  be  detected  by 
a  careful  investigation  of  the  model’ s  behavior  as  compared  to  the  actual  data,  as  was 
done  in  this  investigation.  However,  a  better  solution  would  be  the  development  of  an  a 
priori  power  analysis  for  various  neural  network  architectures,  such  as  a  best  practice 
rule-of-thumb  calculation  or  an  empirically  derived  formulation  such  as  that  provided  by 
G*Power  for  traditional  statistical  modeling. 

Another  possible  reason  for  the  unexpected  poor  performance  of  neural  network 
models  in  this  study  may  lie  in  the  characteristics  of  the  data.  Specifically,  if  the 
relationship  among  the  independent  variables  and  dependent  variables  is  truly  linear,  then 
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a  linear  regression  would  naturally  be  a  more  appropriate  model  to  determine  these 
relationships.  True  linearity  is  not  always  known  but  may  be  expected  when  most  of  the 
predictor  variables  in  a  model  are  dichotomous  because  this  suggests  that  their 
contribution  to  the  overall  model  is  on  a  linear  scale  (118).  However,  in  the  present 
research,  only  two  of  the  predictor  variables  were  binary;  therefore,  if  the  true 
relationship  among  the  variables  is  linear,  it  is  likely  not  an  artifact  of  the  variable  scale. 

The  neural  network  software  application  turned  out  to  be  somewhat  limited  for 
the  current  study.  Specifically,  the  SPSS  Neural  Network  Add-on  program  did  not  allow 
the  planned  analysis  for  Specific  Aim  3  (global  sensitivity  analysis)  to  be  conducted  for 
the  neural  network  model  because  the  software  did  not  allow  the  connection  weights  in 
the  model  to  be  held  constant  for  follow-up  comparisons.  As  a  result,  global  sensitivity 
analysis  could  only  be  carried  out  on  the  linear  regression  model  which  precluded  a  direct 
comparison  of  this  analysis  with  the  neural  network  model.  This  required  an  alternative 
sensitivity  analysis  (independent  variable  importance)  be  conducted  to  evaluate  Specific 
Aim  3. 

Finally,  this  study  was  a  cross-sectional  analysis  and  cannot  provide  causal 
predictions  or  information  on  changes  that  occur  in  cancer  survivor  health  status  over 
time. 

Future  Research 

This  study  demonstrated  clinically  relevant  findings  with  regard  to  the  importance 
of  the  clinical  domains  on  the  CSPro  as  related  to  mental  and  physical  functioning  in  this 
post-treatment  sample  of  breast  cancer  survivors.  However,  no  analysis  was  conducted  to 
evaluate  which  subscales  of  the  global  clinical  domains  were  significant  correlates  of 
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mental  and  physical  functioning.  For  example,  subscales  of  the  Symptom  Burden 
Domain  include  anxiety,  pain,  fear  of  recurrence,  body  image,  fatigue,  and  depression.  It 
would  be  useful  to  identify  which  specific  subscales  contribute  to  significant  changes  in 
mental  and  physical  functioning  outcomes  to  develop  more  targeted  interventions. 

The  potential  clinical  significance  of  identifying  unique  variables  in  the  neural 
network  model  should  not  be  underestimated.  In  addition  to  the  clinical  domain  variables 
(CSPro  domains)  identified  by  both  statistical  approaches,  age  (demographic  variable) 
and  time  since  diagnosis  (medical  variable)  were  also  identified  in  the  neural  network  as 
potentially  important  predictors.  These  findings  suggest  that  the  neural  network  model  is 
highlighting  important  non-linear  relationships  among  these  predictor  variables  and  the 
dependent  variables.  The  shape  or  nature  of  this  relationship  is  not  known. 

Understanding  the  shape  of  these  relationships  could  provide  additional  clinical  utility  for 
clinicians  and  patients  in  understanding  the  trajectory  of  their  mental  and  physical  health 
status.  Future  research  should  clarify  the  exact  shape  of  these  non-linear  relationships 
(e.g.,  oscillating  function,  exponential  function,  etc.) 

Future  research  in  this  area  should  also  include  larger  sample  sizes  to  compare 
and  contrast  neural  network  models  to  traditional  statistical  models  in  predicting 
psychosocial  factors  in  cancer  survivors  to  decrease  any  possible  confound  of  small 
sample  size  in  neural  network  analysis.  If  recruiting  a  large  number  of  participants  is 
unrealistic,  a  resampling  method  such  as  bootstrapping  may  be  useful.  Bootstrapping 
involves  a  computer-generated,  repeated  random  sampling-with-replacement  from  the 
full  set  of  known  cases  to  produce  random  samples  for  analysis  that  characteristically 
differ  from  the  original  sample.  This  resampling  method  allows  the  same  sample  to  be 
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repeatedly  used  for  statistical  analysis  in  an  effort  to  offset  the  drawbacks  of  a  small 
dataset. 

Given  the  unexpected  problems  encountered  in  this  project  with  regard  to 
sensitivity  analysis,  careful  consideration  should  be  given  to  the  particular  software 
application  used.  The  SPSS  Neural  Network  Add-on  (60)  may  best  used  in  project  that 
requires  only  a  basic  application  of  the  neural  network  model.  STATISTICA  is  neural 
network  software  that  has  been  used  by  other  researchers  to  evaluate  the  same  specific 
aims  outlined  in  this  project,  including  global  sensitivity  analysis  (99;  1 14).  Other 
programs  that  have  this  capability  include  R:  neuralnet  (51)  and  MATLAB  (108). 
However,  aside  from  SPSS,  these  software  packages  require  varying  levels  of  familiarity 
with  programming  code  to  run  more  advanced  neural  network  analysis. 

The  present  study  is  cross-sectional  in  nature  and,  therefore,  does  not  provide 
information  on  changes  that  can  occur  in  cancer  survivor  health  status  over  time.  As  a 
result,  the  research  design  does  not  allow  the  investigators  to  examine  the  differential 
performance  of  the  predictive  statistical  models  with  multiple  measures  over  time  (i.e., 
traditional  statistical  models  compared  to  neural  network  models).  A  prospective  follow¬ 
up  study  could  examine  the  trajectory  of  breast  cancer  survivors’  mental  and  physical 
health  status  over  time  by  evaluating  these  factors  immediately  after  treatment  and  then 
again  at  1-  and  5-years  later.  The  comparison  and  performance  of  predictive  statistical 
models  with  this  prospective  data  may  also  be  informative. 

Because  this  study  focuses  on  the  comparison  of  two  statistical  models,  no 
conclusion  can  be  made  about  the  outcome  modifying  each  independent  variable  would 
have  on  cancer  survivor  functioning.  In  the  future,  an  intervention  study  would  be 
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needed  to  determine  the  impact  of  specific  interventions  suggested  by  both  analytic 
approaches  on  depression  and  physical  activity. 

Conclusions 

Neural  network  models  did  not  outperform  linear  regression  analysis  in  predicting 
mental  and  physical  functioning  in  this  sample  of  post-treatment  breast  cancer  survivors. 
However,  neural  network  models  may  still  be  useful  in  modeling  cancer  survivors’ 
mental  and  physical  functioning.  Both  linear  regression  and  neural  network  modeling 
identified  modifiable  variables  (clinical  domains  of  the  CSPro)  as  important  correlates  of 
post-treatment  mental  and  physical  functioning.  The  neural  network  model  also  added  to 
the  results  by  identifying  additional  variables  (age,  time  since  diagnosis)  that  have  some 
type  of  non-linear  relationship  with  mental  and  physical  functioning.  These  findings  may 
promote  a  better  understanding  of  post-treatment  health  status. 
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APPENDICES 


Appendix  2:  Demographic  and  Medical  Survey 


Please  complete  the  following  questions. 

What  is  your  date  of  birth? 


What  is  your  age? 


What  is  your  highest  level  of  education? 

1 .  Less  than  high  school 

2.  High  school 

3.  Some  college 

4.  Associate’s  degree 

5.  Bachelor’s  degree 

6.  Some  graduate  school 

7.  Graduate  degree 

What  is  your  marital  status? 

1.  Single 

2.  Single,  cohabitating 

3.  Married 

4.  Divorced 

5.  Widowed 

What  is  your  race? 

1 .  Asian 

2.  Black  or  African  American 

3.  Caucasian 

4.  Hispanic  or  Latino 

5.  Native  American/ Alaska  Native 

6.  Native  Hawaiian  or  Pacific  Islander 

7.  Other 

What  is  your  employment  status? 

1 .  Unemployed  (by  choice) 

2.  Unemployed  (not  by  choice) 

3.  Work  full-time 

4.  Work  part-time 

If  you  work,  what  is  your  job  title? 
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What  is  your  estimated  household  income? 

1.  Less  than  $10,000 

2.  $10,000 -$19,000 

3.  $20,000  -  $39,000 

4.  $40,000-  $59,000 

5.  $60,000  -  $79,000 

6.  $80,000  -  $99,000 

7 .  $  1 00,000  or  more 

What  stage  of  cancer  were  you  diagnosed  with? 

1 .  Stage  I 

2.  Stage  II 

3.  Stage  III 

Were  you  treated  with  surgery  for  cancer? 

1.  Yes 

2.  No 

Were  you  treated  with  chemotherapy  for  cancer? 

1.  Yes 

2.  No 

Were  you  treated  with  radiation  for  cancer? 

1.  Yes 

2.  No 

Did  you  receive  any  adjuvant  treatment  for  cancer? 

1.  Yes 

2.  No 

Did  you  receive  other  treatment  for  cancer? 

1.  Yes 

2.  No 

What  was  the  date  you  were  diagnosed  with  cancer? 

Month: _ 

Day: _ 

Year: _ 

What  was  the  date  that  all  primary  treatment  (surgery,  chemotherapy,  radiation)  was 
completed? 

Month: _ 

Day: _ 

Year: _ 
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What  is  your  menopausal  status? 

1 .  Pre-menopausal  prior  to  cancer,  post-menopausal  after  treatment 

2.  Pre-menopausal  prior  to  treatment,  pre-menopausal  after  treatment 

3.  Post-menopausal  before  diagnosis  or  treatment 
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Appendix  3:  Cancer  Survivor  Profile  (CSPro) 

(112) 

Given  your  life  as  it  is  now,  how  do  you  feel  about  having  had  cancer? 
Mark  the  box  that  best  describes  how  much  you  agree  or  disagree  with  each 
statement. 

1 .  Having  had  cancer  makes  me  feel  uncertain  about  my  health. 

1  =  Strongly  disagree 

2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

2.  I  worry  about  the  future. 

1  =  Strongly  disagree 

2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

3.  Having  had  cancer  makes  me  feel  unsure  about  the  future. 

1  =  Strongly  disagree 

2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

4.  I  worry  about  cancer  coming  back. 

1  =  Strongly  disagree 

2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

5.  New  symptoms  make  me  worry  about  the  cancer  coming  back. 

1  =  Strongly  disagree 

2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

6.  I  worry  about  my  health. 

1  =  Strongly  disagree 

2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

7.  I  feel  disfigured. 

1  =  Strongly  disagree 

2  =  Disagree 
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3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

8.  I  sometimes  wear  clothing  to  cover  parts  of  my  body. 

1  =  Strongly  disagree 

2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

9.  I  worry  about  how  my  body  looks. 

1  =  Strongly  disagree 

2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 


The  following  questions  are  about  having  a  family. 

Mark  the  box  whether  you  agree  or  disagree  with  each  statement. 

10.  Before  being  diagnosed  with  cancer,  had  you  wanted  to  have  a  child  (or  another 
child)? 

1  =  Yes 

2  =  No 

11.  Since  having  had  cancer,  have  you  wanted  to  have  a  child  (or  another  child)? 

1=  Yes 
2=  No 

12.  When  I  see  families  with  children  I  feel  left  out. 

1  =  Strongly  disagree 

2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

13. 1  can’t  help  comparing  myself  with  friends  who  have  children. 

1  =  Strongly  disagree 

2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

14. 1  will  do  just  about  anything  to  have  a  child  (or  another  child). 

1  =  Strongly  disagree 

2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

15.  Having  a  child  (or  another  child)  is  not  necessary  for  my  happiness. 
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1  =  Strongly  disagree 

2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

16. 1  could  visualize  a  happy  life  together,  without  a  child  (or  another  child). 

1  =  Strongly  disagree 

2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

6  =  Not  applicable 

17.  We  could  have  a  long,  happy  relationship  without  a  child  (or  another  child). 

1  =  Strongly  disagree 

2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

6  =  Not  applicable 


The  next  set  of  questions  relate  to  how  you  view  your  health. 

Mark  the  box  that  best  describes  how  much  you  agree  or  disagree  with  the 
statement. 

18.  No  matter  how  hard  I  try,  my  health  just  doesn’t  turn  out  the  way  I  would  like. 

1  =  Strongly  disagree 

2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

19.  It  is  difficult  for  me  to  find  effective  solutions  to  the  health  problems  that  come 
my  way. 

1  =  Strongly  disagree 

2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

20. 1  succeed  in  the  projects  I  undertake  to  improve  my  health. 

1  =  Strongly  disagree 

2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

21.  I’m  generally  able  to  accomplish  my  goals  with  respect  to  my  health. 

1  =  Strongly  disagree 
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2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

22. 1  find  my  efforts  to  change  things  I  don’t  like  about  my  health  are  ineffective. 

1  =  Strongly  disagree 

2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

23.  Typically,  my  plans  for  my  health  don’t  work  out  well. 

1  =  Strongly  disagree 

2  =  Disagree 

3  =  Neutral 

4  =  Agree 

5  =  Strongly  agree 

The  next  set  of  questions  ask  about  how  confident  you  are  in  your  ability  to  interact 
with  your  doctor. 

Mark  the  box  about  how  confident  you  are  in  your  ability: 

24.  How  confident  are  you  in  your  ability  to  ask  a  doctor  questions  about  your  chief 
health  concern? 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 

25.  How  confident  are  you  in  your  ability  to  get  a  doctor  to  answer  all  your 
questions? 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 

26.  How  confident  are  you  in  your  ability  to  explain  your  chief  health  concern  to  a 
doctor? 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 

27.  How  confident  are  you  in  your  ability  to  get  a  doctor  to  take  your  chief  health 
concern  seriously? 

1  =  Not  at  all 
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2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 

28.  How  confident  are  you  in  your  ability  to  get  a  doctor  to  do  something  about  your 
chief  health  concern? 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 

29.  How  confident  are  you  in  your  ability  to  ask  a  doctor  for  more  information  if  you 
don’t  understand  what  he  or  she  said? 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 


The  next  set  of  questions  is  about  your  relationship  with  others  since  the  end  of 
primary  treatment  (e.g.,  chemotherapy,  radiation,  surgery). 

Mark  the  box  that  best  describes  how  you  feel  about  each  statement. 

30.  I  feel  people  avoid  talking  to  me. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Usually 

5  =  Always 

31.  I  feel  isolated  from  others. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Usually 

5  =  Always 

32.  I  have  someone  who  will  listen  to  me  when  I  need  to  talk. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Usually 

5  =  Always 

33.  I  have  someone  who  understands  my  problems. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 
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4  =  Usually 

5  =  Always 

34. 1  can  get  helpful  advice  from  others  when  dealing  with  a  problem. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Always 

35.  Is  someone  available  to  help  you  if  you  need  it? 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Usually 

5  =  Always 


The  following  questions  ask  about  your  ability  to  perform  at  work. 

Mark  the  box  that  best  describes  how  you  feel  about  each  statement. 

36.  Are  you  currently  employed? 

1  =  Yes 

2  =  No 

37.  Current  work  ability  compared  to  your  highest  work  ability  ever: 

How  many  points  would  you  give  your  current  work  ability? 

0  means  that  you  cannot  currently  work  and  5  is  your  work  ability  at  its  best. 

0  1  2  3  4  5 

completely  work  ability  at  its  best 

unable 

to  work 

38.  Work  ability  in  its  relation  to  the  demands  of  the  job. 

How  do  you  rate  your  current  work  ability  with  respect  to  the  physical  demands 
of  your  work? 

1  =  Very  good 

2  =  Rather  good 

3  =  Moderate 

4  =  Rather  poor 

5  =  Very  poor 

39.  Work  ability  in  its  relation  to  the  demands  of  the  job. 

How  do  you  rate  your  current  work  ability  with  respect  to  the  mental  demands  of 
your  work? 

1  =  Very  good 

2  =  Rather  good 

3  =  Moderate 

4  =  Rather  poor 
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5  =  Very  poor 


The  next  questions  are  about  your  height  and  weight. 

40.  About  how  much  do  you  weigh  without  shoes? _ 

41.  About  how  tall  are  you  without  shoes? _ 


The  next  set  of  questions  is  about  challenges  you  may  have  had  in  the  past  7  days. 
Mark  the  box  that  best  describes  how  you  feel  about  each  statement. 

In  the  vast  7  days: 

42.  How  much  did  pain  interfere  with  your  day-to-day  activities? 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 

43.  How  severe  was  your  pain? 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 

44.  How  severe  was  your  joint  pain? 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 

45.  How  much  did  pain  (e.g.,  back  pain,  arm  pain,  hand  pain,  hip  pain,  bone  pain,  muscle 
pain)  affect  your  daily  activities? 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 

46.  How  much  did  you  experience  burning  and/or  sharp  pain? 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 
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The  next  set  of  questions  is  about  challenges  you  may  have  had  in  the  past  7  days. 
Mark  the  box  that  best  describes  how  you  feel  about  each  statement. 

In  the  vast  7  days: 

47. 1  was  satisfied  with  my  sleep. 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 

48. 1  had  difficulty  falling  asleep. 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 
49.  My  sleep  was  restless. 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 

50. 1  had  a  problem  with  my  sleep. 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 

51.1  felt  tired. 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 
52.  My  sleep  quality  was. 

1  =  Very  good 

2  =  Good 

3  =  Fair 

4  =  Poor 

5  =  Very  poor 


The  next  set  of  questions  is  about  challenges  you  may  have  had  in  the  past  7  days. 
Mark  the  box  that  best  describes  how  you  feel  about  each  statement. 

In  the  vast  7  days: 
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53.  How  run-down  did  you  feel  on  average? 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 

54.  How  fatigued  were  you  on  average? 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 

55.  To  what  degree  did  you  feel  that  you  had  no  energy? 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 

56.  How  often  did  you  need  to  rest  during  the  day? 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 

57.  How  often  did  you  experience  fatigue? 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 

58.  How  often  did  your  fatigue  come  on  suddenly? 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 


The  next  set  of  questions  is  about  challenges  you  may  have  had  in  the  past  7  days. 
Mark  the  box  that  best  describes  how  you  feel  about  each  statement. 

In  the  past  7  days: 

59. 1  felt  like  nothing  could  cheer  me  up. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 
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4  =  Often 

5  =  Always 

60. 1  felt  unhappy. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Always 

61.1  felt  depressed. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Always 

62. 1  felt  that  I  had  nothing  to  look  forward  to. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Always 

63. 1  felt  very  emotional. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Always 

64. 1  felt  tearful  or  like  crying. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Always 


The  next  set  of  questions  is  about  challenges  you  may  have  had  in  the  past  7  days. 
Mark  the  box  that  best  describes  how  you  feel  about  each  statement. 

In  the  past  7  days: 

65. 1  felt  anxious. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Always 

66. 1  felt  fearful. 

1  =  Never 
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2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Always 

67. 1  felt  tense. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Always 

68.  My  worries  overwhelmed  me. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Always 

69. 1  felt  irritable. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Always 

70. 1  felt  worried  about  my  health. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Always 


The  next  set  of  questions  is  about  challenges  you  may  have  had  in  the  past  7  days. 
Mark  the  box  that  best  describes  how  you  feel  about  each  statement. 

In  the  vast  7  days: 

71.  My  thinking  has  been  slow. 

1  =  Never 

2  =  Rarely  (Once) 

3  =  Sometimes  (Two  or  three  times) 

4  =  Often  (About  once  a  day) 

5  =  Very  often  (Several  times  a  day) 

72. 1  have  had  trouble  shifting  back  and  forth  between  different  activities  that  require 
thinking. 

1  =  Never 

2  =  Rarely  (Once) 

3  =  Sometimes  (Two  or  three  times) 

4  =  Often  (About  once  a  day) 
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5  =  Very  often  (Several  times  a  day) 

73.  My  problems  with  memory,  concentration,  or  making  mental  mistakes  have 
interfered  with  the  quality  of  my  life. 

1  =  Never 

2  =  Rarely  (Once) 

3  =  Sometimes  (Two  or  three  times) 

4  =  Often  (About  once  a  day) 

5  =  Very  often  (Several  times  a  day) 

74. 1  have  had  trouble  concentrating. 

1  =  Never 

2  =  Rarely  (Once) 

3  =  Sometimes  (Two  or  three  times) 

4  =  Often  (About  once  a  day) 

5  =  Very  often  (Several  times  a  day) 

75.  My  brain  was  in  a  fog. 

1  =  Never 

2  =  Rarely  (Once) 

3  =  Sometimes  (Two  or  three  times) 

4  =  Often  (About  once  a  day) 

5  =  Very  often  (Several  times  a  day) 

76. 1  have  had  trouble  finding  words  when  talking  to  someone. 

1  =  Never 

2  =  Rarely  (Once) 

3  =  Sometimes  (Two  or  three  times) 

4  =  Often  (About  once  a  day) 

5  =  Very  often  (Several  times  a  day) 


The  next  set  of  questions  is  about  challenges  you  may  have  had  in  the  past  30  days. 
Mark  the  box  that  best  describes  how  you  feel  about  each  statement. 

In  the  past  30  days: 

77.  How  interested  have  you  been  in  sexual  activity? 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 

78.  How  often  have  you  felt  like  you  wanted  to  have  sex? 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Always 

79.  How  satisfied  have  you  been  with  your  sex  life? 

1  =  Not  at  all 
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2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 

80.  How  much  have  scars  from  surgery  affected  your  satisfaction  with  your  sex  life? 

1  =  Not  at  all 

2  =  A  little  bit 

3  =  Somewhat 

4  =  Quite  a  bit 

5  =  Very  much 


The  next  set  of  questions  are  about  financial  matters  related  to  cancer. 

Indicate  how  often  each  of  these  statements  has  been  true  for  you  in  the  past  30 
days. 

81.  You  had  financial  problems  because  of  the  cost  of  cancer  surgery  or  treatment. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Always 

83.  You  had  problems  with  insurance  because  of  cancer. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Always 

84.  You  had  money  problems  that  arose  because  you  had  cancer. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Always 

85.  You  had  financial  problems  due  to  a  loss  of  income  as  a  result  of  cancer. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Always 


The  next  set  of  questions  is  about  challenges  you  may  have  had  in  the  past  30  days. 
Mark  the  box  that  best  describes  how  you  feel  about  each  statement. 

In  the  vast  30  days: 
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86.  Did  you  drink  any  type  of  alcoholic  beverage? 

1  =  Yes 

2  =  No 

87.  I  took  risks  when  I  drank. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Almost  always 

88.  Drinking  created  problems  between  me  and  others. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Almost  always 

89. 1  had  trouble  getting  things  done  after  I  drank. 

1  =  Never 

2  =  Rarely 

3  =  Sometimes 

4  =  Often 

5  =  Almost  always 


Please  think  about  what  you  usually  ate  or  drank  during  the  past  month,  that  is,  the 
past  30  days.  Please  read  each  question  and  report  how  many  times  per  day,  week, 
or  month  you  ate  each  food. 

90.  How  many  times  per  day,  week,  or  month  did  you  usually  eat  bacon  or  sausage,  not 
including  low  fat,  light,  or  turkey  varieties? 

1  =  Never 

2=1-3  times  last  month 

3  =  1-2  times  per  week 

4  =  3-4  times  per  week 

5  =  5-6  times  per  week 

6  =  1  time  per  day 

7  =  2  times  per  day 

8  =  3  times  per  day 

9  =  4  or  more  times  per  day 

9 1 .  How  often  did  you  eat  hot  dogs  made  of  beef  or  pork? 

1  =  Never 

2=1-3  times  last  month 

3  =  1-2  times  per  week 

4  =  3-4  times  per  week 

5  =  5-6  times  per  week 

6  =  1  time  per  day 

7  =  2  times  per  day 
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8  =  3  times  per  day 

9  =  4  or  more  times  per  day 

92.  How  often  did  you  use  regular  fat  salad  dressing  or  mayonnaise,  including  on 
salad  and  sandwiches?  Do  not  include  low-fat,  light,  or  diet  dressings. 

1  =  Never 

2=1-3  times  last  month 

3  =  1-2  times  per  week 

4  =  3-4  times  per  week 

5  =  5-6  times  per  week 

6  =  1  time  per  day 

7  =  2  times  per  day 

8  =  3  times  per  day 

9  =  4  or  more  times  per  day 

93.  How  often  did  you  eat  French  fries,  home  fries,  or  hash  brown  potatoes? 

1  =  Never 

2=1-3  times  last  month 

3  =  1-2  times  per  week 

4  =  3-4  times  per  week 

5  =  5-6  times  per  week 

6  =  1  time  per  day 

7  =  2  times  per  day 

8  =  3  times  per  day 

9  =  4  or  more  times  per  day 

94.  How  often  did  you  eat  peanuts,  walnuts,  seeds,  or  other  nuts?  Do  not  include 
peanut  butter. 

1  =  Never 

2=1-3  times  last  month 

3  =  1-2  times  per  week 

4  =  3-4  times  per  week 

5  =  5-6  times  per  week 

6  =  1  time  per  day 

7  =  2  times  per  day 

8  =  3  times  per  day 

9  =  4  or  more  times  per  day 

95.  How  often  did  you  eat  regular  fat  potato  chips,  tortilla  chips,  or  corn  chips?  Do 

not  include  low-fat  chips. 

1  =  Never 

2=1-3  times  last  month 

3  =  1-2  times  per  week 

4  =  3-4  times  per  week 

5  =  5-6  times  per  week 

6  =  1  time  per  day 

7  =  2  times  per  day 

8  =  3  times  per  day 

9  =  4  or  more  times  per  day 
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Below  are  questions  about  needs  that  you  may  have  experienced  as  a  result  of 
having  cancer.  Mark  the  box  that  best  describes  whether  you  have  needed  help  with 
these  needs  in  the  last  30  days.  There  are  5  possible  answers  to  choose  from: 


No 

Need 

1  Not  applicable-  This  was  not  a  problem  for  me  as  a  result  of  cancer. 

2  Satisfied- 1  did  need  help  with  this,  but  my  need  for  help  was  satisfied  at  the 
time. 

Some 

Need 

3  Low  need-  This  item  caused  me  concern  or  discomfort.  I  had  little  need  for 
additional  help. 

4  Moderate  need-  This  item  caused  me  concern  or  discomfort.  I  had  some  need 
for  additional  help. 

5  High  need-  This  item  caused  me  concern  or  discomfort.  I  had  a  strong  need 
for  additional  help. 

96.  Being  given  written  information  about  important  aspects  of  your  care. 

1  =  Not  applicable 

2  =  Satisfied 

3  =  Low  need 

4  =  Moderate  need 

5  =  High  need 

97.  Being  given  explanations  of  those  tests  for  which  you  would  like  explanations. 

1  =  Not  applicable 

2  =  Satisfied 

3  =  Low  need 

4  =  Moderate  need 

5  =  High  need 

98.  Being  adequately  informed  about  the  benefits  and  side-effects  of  treatments 
before  you  choose  to  have  them. 

1  =  Not  applicable 

2  =  Satisfied 

3  =  Low  need 

4  =  Moderate  need 

5  =  High  need 

99.  Being  informed  about  your  test  results  as  soon  as  feasible. 

1  =  Not  applicable 

2  =  Satisfied 

3  =  Low  need 

4  =  Moderate  need 

5  =  High  need 

100.  Being  informed  about  things  you  can  do  to  help  yourself  get  well. 

1  =  Not  applicable 

2  =  Satisfied 

3  =  Low  need 

4  =  Moderate  need 

5  =  High  need 
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101.  Being  able  to  judge  the  quality  of  cancer  related  information  provided  on  the 
Internet. 

1  =  Not  applicable 

2  =  Satisfied 

3  =  Low  need 

4  =  Moderate  need 

5  =  High  need 

For  the  next  set  of  questions,  use  the  following  as  a  guide  to  describe  your  activity 
level: 


1.  Physical  Inactivity:  The  inactive  person  spends  most  waking  hours  sitting  or 
standing  quietly.  Activities  include  working  at  a  desk,  reading,  watching 
television,  or  other  quiet  pursuits.  Usually  does  not  walk  more  than  a  few 
minutes. 

2.  Light  Physical  Inactivity:  This  person  usually  walks  more  than  10  minutes  at 
a  time  each  day,  leisurely  rides  a  bicycle,  fishes,  bowls,  golfs,  or  engages  in  light 
carpentry,  light  gardening,  light  industrial  work,  teaching,  or  light  housework  on  a 
regular  basis. 

3.  Moderate  Physical  Activity:  This  person  participates  in  such  activities  as 
brisk  walking,  recreation  or  doubles  tennis,  or  swimming;  or  works  in  such 
occupations  as  mail  earner,  telephone  repair,  light  building,  and  construction;  or 
engages  in  housework  and  home  repairs  or  moderate  gardening. 

4.  Heavy  Physical  Activity:  This  person  performs  vigorous  activity  on  a  regular 
basis,  including  jogging,  singles  tennis,  paddleball,  or  high-intensity  aerobics;  or 
engages  in  heavy  activities,  such  as  carrying  heavy  weights  (20  lb  or  more), 
strenuous  farm  work,  or  strenuous  gardening. 

102.  Thinking  about  the  things  you  usually  did  at  work  during  the  last  12  months,  how 
would  you  describe  the  kind  of  physical  activity  you  performed? 

1  =  Inactive 

2  =  Light 

3  =  Moderate 

4  =  Heavy 

103.  Thinking  about  the  things  you  usually  did  at  home  during  the  last  12  months,  how 
would  you  describe  the  kind  of  physical  activity  you  performed? 

1  =  Inactive 

2  =  Light 

3  =  Moderate 

4  =  Heavy 

104.  Thinking  about  the  things  you  usually  did  in  your  leisure  time  during  the  last  12 
months,  how  would  you  describe  the  kind  of  physical  activity  you  performed? 


Ill 


1  =  Inactive 

2  =  Light 

3  =  Moderate 

4  =  Heavy 


The  next  set  of  questions  is  about  cigarette  smoking. 

Mark  the  box  that  best  describes  your  experience  with  each  statement. 

105.  Have  you  smoked  at  least  100  cigarettes  in  your  entire  life? 

Note:  5  packs  =  100  cigarettes 

1  =  Yes 

2  =  No 

3  =  Don’t  know  /  Not  sure 

106.  Do  you  smoke  cigarettes  every  day,  some  days,  or  not  at  all? 

1  =  Every  day 

2  =  Some  days 

3  =  Not  at  all 

4  =  Don’t  know  /  Not  sure 

107.  During  the  past  12  months,  have  you  stopped  smoking  for  one  day  or  longer  because 
you  were  trying  to  quit  smoking? 

1  =  Yes 

2  =  No 

3  =  Don’t  know  /  Not  sure 

108.  How  long  has  it  been  since  you  last  smoked  a  cigarette,  even  one  or  two  puffs? 

1  =  Within  the  past  month  (less  than  1  month  ago) 

2  =  Within  the  past  3  months  (1  month  but  less  than  3  months  ago) 

3  =  Within  the  past  6  months  (3  months  but  less  than  6  months  ago) 

4  =  Within  the  past  year  (6  months  but  less  than  1  year  ago) 

5  =  Within  the  past  5  years  (1  year  but  less  than  5  years  ago) 

6  =  Within  the  past  10  years  (5  years  but  less  than  10  years  ago) 

7  =  10  years  or  more 

8  =  Don’t  know  /  Not  sure 
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Appendix  4:  Center  for  Epidemiological  Studies  -  Depression  Scale  (CES-D) 


Available  in  the  public  domain  http : //w w w .ncbi . nlm.nih .  go v/books/NB K6405 6/ 
(25) 

Below  is  a  list  of  some  of  the  ways  you  may  have  felt  or  behaved.  Please  indicate  how 
often  you  have  felt  this  way  during  the  past  week. 


1.  I  was  bothered  by  things  that  usually  don’t  bother  me. 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 

4  =  Most  or  all  of  the  time  (5-7  days) 

2.  I  did  not  feel  like  eating;  my  appetite  was  poor. 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 

4  =  Most  or  all  of  the  time  (5-7  days) 

3.  I  felt  that  I  could  not  shake  off  the  blues  even  with  help  from  my  family  or  friends. 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 

4  =  Most  or  all  of  the  time  (5-7  days) 

4.  I  felt  that  I  was  just  as  good  as  other  people. 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 

4  =  Most  or  all  of  the  time  (5-7  days) 

5.  I  had  trouble  keeping  my  mind  on  what  I  was  doing. 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 

4  =  Most  or  all  of  the  time  (5-7  days) 

6.  I  felt  depressed. 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 

4  =  Most  or  all  of  the  time  (5-7  days) 
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7.  I  felt  that  everything  I  did  was  an  effort. 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 

4  =  Most  or  all  of  the  time  (5-7  days) 

8.  I  felt  hopeful  about  the  future. 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 

4  =  Most  or  all  of  the  time  (5-7  days) 

9.  I  thought  my  life  had  been  a  failure. 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 

4  =  Most  or  all  of  the  time  (5-7  days) 

10.  I  felt  fearful. 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 

4  =  Most  or  all  of  the  time  (5-7  days) 

11.  My  sleep  was  restless. 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 

4  =  Most  or  all  of  the  time  (5-7  days) 

12.  I  was  happy. 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 

4  =  Most  or  all  of  the  time  (5-7  days) 

13.  I  talked  less  than  usual. 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 

4  =  Most  or  all  of  the  time  (5-7  days) 

14.  I  felt  lonely. 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 
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4  =  Most  or  all  of  the  time  (5-7  days) 

15.  People  were  unfriendly. 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 

4  =  Most  or  all  of  the  time  (5-7  days) 

16.  I  enjoyed  life. 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 

4  =  Most  or  all  of  the  time  (5-7  days) 

17.  I  had  crying  spells. 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 

4  =  Most  or  all  of  the  time  (5-7  days) 

18.  I  felt  sad. 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 

4  =  Most  or  all  of  the  time  (5-7  days) 

19.  I  felt  that  people  disliked  me. 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 

4  =  Most  or  all  of  the  time  (5-7  days) 

20.  I  could  not  get  “going.” 

1  =  Rarely  or  none  of  the  time  (less  than  1  day) 

2  =  Some  or  a  little  of  the  time  (1-2  days) 

3  =  Occasionally  or  a  moderate  amount  of  the  time  (3-4  days) 

4  =  Most  or  all  of  the  time  (5-7  days) 
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Appendix  5:  Behavioral  Risk  Factor  Surveillance  System  (BRFSS)  - 
Physical  Activity 


Available  in  the  public  domain  http://www.cdc.gov/brfss/questionnaires/index.htm 
(27) 


1.  During  the  past  month,  other  than  your  regular  job,  did  you  participate  in  any  physical 
activities  or  exercises  such  as  running,  calisthenics,  golf,  gardening,  or  walking  for 
exercise? 

1  =  Yes 

2  =  No 

3  =  Don’t  know  /  Not  sure 


2.  What  type  of  physical  activity  or  exercise  did  you  spend  the  most  time  doing  during 
the  past  month? 

1  =  Active  gaming  devices  (Wii  Fit,  Dance  Dance  Revolution) 

2  =  Aerobics  video  or  classes 

3  =  Backpacking 

4  =  Badminton 

5  =  Basketball 

6  =  Bicycling  machine  exercise 

7  =  Bicycling 

8  =  Boating  (canoeing,  rowing,  kayaking,  sailing  for  pleasure) 

9  =  Bowling 

10  =  Boxing 
11=  Calisthenics 

12  =  Canoeing  /  rowing  in  competition 

13  =  Carpentry 

14  =  Dancing  (ballet,  ballroom,  Latin,  hip  hop,  etc.) 

15  =  Elliptical  /  EFX  machine  exercise 

16  =  Fishing  from  a  river  bank  or  boat 

17  =  Frisbee 

18  =  Gardening  (spading,  weeding,  digging,  filling) 

19  =  Golf  (with  motorized  cart) 

20  =  Golf  (without  motorized  cart) 

21  =  Handball 

22  =  Hiking  cross-country 

23  =  Hockey 

24  =  Horseback  riding 

25  =  Hunting  large  game  such  as  deer  or  elk 

26  =  Hunting  small  game  such  as  quail 

27  =  Inline  skating 

28  =  Jogging 

29  =  Facrosse 

30  =  Mountain  climbing 
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31  =  Mowing  lawn 

32  =  Paddleball 

33  =  Painting  /  papering  house 

34  =  Pilates 

35  =  Racquetball 

36  =  Raking  lawn 

37  =  Running 

38  =  Rock  climbing 

39  =  Rope  skipping 

40  =  Rowing  machine  exercise 

41  =  Rugby 

42  =  Scuba  diving 

43  =  Skateboarding 

44  =  Skating,  ice  or  roller 

45  =  Sledding,  tobogganing 

46  =  Snorkeling 

47  =  Snow  blowing 

48  =  Snow  shoveling  by  hand 

49  -  Snow  skiing 

50  =  Snowshoeing 

51  =  Soccer 

52  =  Softball  /  baseball 

53  =  Squash 

54  =  Stair  climbing  /  stair  master 

55  =  Stream  fishing  in  waders 

56  =  Surfing 

57  =  Swimming 

58  =  Swimming  in  laps 

59  =  Table  tennis 

60  =  Tai  Chi 

61  =  Tennis 

62  =  Touch  football 

63  =  Volleyball 

64  =  Walking 

66  =  Waterskiing 

67  =  Weight  lifting 

68  =  Wrestling 

69  =  Yoga 

70  =  Other  activity 

71  =  Don’t  know  /  not  sure 


3.  How  many  times  per  week  did  you  take  part  in  this  activity  during  the  past  month? 
1  =  1 
2  =  2 
3  =  3 
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4  =  4 

5  =  5 

6  =  6 

7  =  7 

8  =  8 
9  =  9 
10=  10 
11  =  11 
12  =  12 
13  =  13 
14=  14 
15  =  15 
16=  16 

17  =  17 

18  =  18 
19=  19 
20  =  20 
21  =21 
22  =  22 

23  =  23 

24  =  24 

25  =  25 

26  =  26 

27  =  27 

28  =  28 

29  =  29 

30  =  30 


4.  And  when  you  took  part  in  this  activity,  for  how  many  minutes  did  you  usually  keep  at 
it? 


(answer  provided  in  minutes) 


5 .  What  other  type  of  physical  activity  gave  you  the  next  most  exercise  during  the  past 
month? 

1  =  Active  gaming  devices  (Wii  Fit,  Dance  Dance  Revolution) 

2  =  Aerobics  video  or  classes 

3  =  Backpacking 

4  =  Badminton 

5  =  Basketball 

6  =  Bicycling  machine  exercise 

7  =  Bicycling 

8  =  Boating  (canoeing,  rowing,  kayaking,  sailing  for  pleasure) 

9  =  Bowling 
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10  =  Boxing 
11=  Calisthenics 

12  =  Canoeing  /  rowing  in  competition 

13  =  Carpentry 

14  =  Dancing  (ballet,  ballroom,  Latin,  hip  hop,  etc.) 

15  =  Elliptical  /  EFX  machine  exercise 

16  =  Fishing  from  a  river  bank  or  boat 

17  =  Frisbee 

18  =  Gardening  (spading,  weeding,  digging,  filling) 

19  =  Golf  (with  motorized  cart) 

20  =  Golf  (without  motorized  cart) 

21  =  Handball 

22  =  Hiking  cross-country 

23  -  Hockey 

24  =  Horseback  riding 

25  =  Hunting  large  game  such  as  deer  or  elk 

26  =  Hunting  small  game  such  as  quail 

27  =  Inline  skating 

28  =  Jogging 

29  =  Lacrosse 

30  =  Mountain  climbing 
31=  Mowing  lawn 

32  =  Paddleball 

33  =  Painting  /  papering  house 

34  =  Pilates 

35  =  Racquetball 

36  =  Raking  lawn 

37  =  Running 

38  =  Rock  climbing 

39  =  Rope  skipping 

40  =  Rowing  machine  exercise 

41  =  Rugby 

42  =  Scuba  diving 

43  =  Skateboarding 

44  =  Skating,  ice  or  roller 

45  =  Sledding,  tobogganing 

46  =  Snorkeling 

47  =  Snow  blowing 

48  =  Snow  shoveling  by  hand 

49  =  Snow  skiing 

50  =  Snowshoeing 

51  =  Soccer 

52  =  Softball  /  baseball 

53  =  Squash 

54  =  Stair  climbing  /  stair  master 

55  =  Stream  fishing  in  waders 
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56  =  Surfing 

57  =  Swimming 

58  =  Swimming  in  laps 

59  =  Table  tennis 

60  =  Tai  Chi 

61  =  Tennis 

62  =  Touch  football 

63  =  Volleyball 

64  =  Walking 

66  =  Waterskiing 

67  =  Weight  lifting 

68  -  Wrestling 

69  =  Yoga 

70  =  Other  activity 

71  =  Don’t  know  /  not  sure 


6.  How  many  times  per  week  did  you  take  part  in  this  activity  during  the  past  month? 
1  =  1 
2  =  2 

3  =  3 

4  =  4 

5  =  5 

6  =  6 

7  =  7 

8  =  8 
9  =  9 
10=  10 
11  =  11 
12  =  12 
13  =  13 
14=  14 
15  =  15 
16=  16 
17=  17 
18  =  18 

19  =  19 

20  =  20 
21  =21 
22  =  22 

23  =  23 

24  =  24 

25  =  25 

26  =  26 

27  =  27 

28  =  28 
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29  =  29 

30  =  30 


7.  And  when  you  took  part  in  this  activity,  for  how  many  minutes  did  you  usually  keep  at 
it? 
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